[00:08:03] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1182678 [00:08:03] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1182678 (owner: 10TrainBranchBot) [00:14:59] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81928 and previous config saved to /var/cache/conftool/dbconfig/20250828-001458-ladsgroup.json [00:24:55] FIRING: [4x] KubernetesRsyslogDown: rsyslog on dse-k8s-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [00:28:30] (03PS1) 10RLazarus: api-gateway: Upgrade to Envoy 1.26.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182680 (https://phabricator.wikimedia.org/T402584) [00:28:33] (03PS1) 10RLazarus: api-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) [00:30:06] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81929 and previous config saved to /var/cache/conftool/dbconfig/20250828-003005-ladsgroup.json [00:32:06] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1182678 (owner: 10TrainBranchBot) [00:36:09] (03PS2) 10RLazarus: {api,rest}-gateway: Upgrade to Envoy 1.26.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182680 (https://phabricator.wikimedia.org/T402584) [00:36:09] (03PS2) 10RLazarus: {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) [00:42:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [00:45:14] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T402925)', diff saved to https://phabricator.wikimedia.org/P81930 and previous config saved to /var/cache/conftool/dbconfig/20250828-004513-ladsgroup.json [00:45:19] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [00:45:29] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance [00:45:36] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81931 and previous config saved to /var/cache/conftool/dbconfig/20250828-004536-ladsgroup.json [00:47:44] RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [00:48:46] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81932 and previous config saved to /var/cache/conftool/dbconfig/20250828-004845-ladsgroup.json [00:56:53] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [01:00:45] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [01:01:53] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [01:03:53] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81933 and previous config saved to /var/cache/conftool/dbconfig/20250828-010353-ladsgroup.json [01:07:50] (03PS1) 10RLazarus: Empty the master branch and document the per-version branch structure [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/1182685 [01:08:27] (03PS1) 10HMonroy: Enable CommunityRequests extension in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182686 (https://phabricator.wikimedia.org/T401268) [01:09:58] (03CR) 10RLazarus: "Convenience link to the only remaining file: https://gerrit.wikimedia.org/r/c/operations/debs/envoyproxy/+/1182685/1/README.md" [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/1182685 (owner: 10RLazarus) [01:10:10] 06SRE, 06Traffic: Recompile fifo-log-demux with hardening options - https://phabricator.wikimedia.org/T342900#11127071 (10BCornwall) p:05Triage→03Low [01:12:29] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 11m 43s) [01:19:01] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81934 and previous config saved to /var/cache/conftool/dbconfig/20250828-011900-ladsgroup.json [01:27:58] (03CR) 10MusikAnimal: [C:03+1] Enable CommunityRequests extension in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182686 (https://phabricator.wikimedia.org/T401268) (owner: 10HMonroy) [01:29:58] (03CR) 10MusikAnimal: [C:03+2] Enable CommunityRequests extension in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182686 (https://phabricator.wikimedia.org/T401268) (owner: 10HMonroy) [01:30:54] (03Merged) 10jenkins-bot: Enable CommunityRequests extension in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182686 (https://phabricator.wikimedia.org/T401268) (owner: 10HMonroy) [01:32:07] (03PS2) 10RLazarus: api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182674 (https://phabricator.wikimedia.org/T403101) [01:32:07] (03PS2) 10RLazarus: api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182675 (https://phabricator.wikimedia.org/T403101) [01:32:07] (03PS2) 10RLazarus: api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182676 (https://phabricator.wikimedia.org/T403101) [01:32:25] RESOLVED: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:32:55] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:34:09] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81935 and previous config saved to /var/cache/conftool/dbconfig/20250828-013408-ladsgroup.json [01:34:14] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [01:34:25] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance [01:34:32] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81936 and previous config saved to /var/cache/conftool/dbconfig/20250828-013431-ladsgroup.json [01:37:42] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81937 and previous config saved to /var/cache/conftool/dbconfig/20250828-013741-ladsgroup.json [01:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [01:52:49] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81938 and previous config saved to /var/cache/conftool/dbconfig/20250828-015249-ladsgroup.json [02:07:57] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81939 and previous config saved to /var/cache/conftool/dbconfig/20250828-020756-ladsgroup.json [02:13:55] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11127168 (10phaultfinder) [02:18:55] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11127184 (10phaultfinder) [02:23:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81940 and previous config saved to /var/cache/conftool/dbconfig/20250828-022304-ladsgroup.json [02:23:09] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [02:23:20] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance [02:23:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81941 and previous config saved to /var/cache/conftool/dbconfig/20250828-022326-ladsgroup.json [02:24:37] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81942 and previous config saved to /var/cache/conftool/dbconfig/20250828-022436-ladsgroup.json [02:39:44] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81943 and previous config saved to /var/cache/conftool/dbconfig/20250828-023944-ladsgroup.json [02:54:52] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81944 and previous config saved to /var/cache/conftool/dbconfig/20250828-025451-ladsgroup.json [03:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [03:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [03:09:59] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81945 and previous config saved to /var/cache/conftool/dbconfig/20250828-030959-ladsgroup.json [03:10:05] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [03:10:15] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance [03:10:22] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81946 and previous config saved to /var/cache/conftool/dbconfig/20250828-031022-ladsgroup.json [03:11:32] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81947 and previous config saved to /var/cache/conftool/dbconfig/20250828-031131-ladsgroup.json [03:26:39] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81948 and previous config saved to /var/cache/conftool/dbconfig/20250828-032638-ladsgroup.json [03:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [03:41:47] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81949 and previous config saved to /var/cache/conftool/dbconfig/20250828-034146-ladsgroup.json [03:56:54] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81950 and previous config saved to /var/cache/conftool/dbconfig/20250828-035653-ladsgroup.json [03:56:58] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance [03:57:00] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [04:13:38] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance [04:13:46] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81951 and previous config saved to /var/cache/conftool/dbconfig/20250828-041345-ladsgroup.json [04:13:51] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [04:15:55] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81952 and previous config saved to /var/cache/conftool/dbconfig/20250828-041555-ladsgroup.json [04:18:53] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [04:24:45] FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-search-backfill is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow [04:24:55] FIRING: [4x] KubernetesRsyslogDown: rsyslog on dse-k8s-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [04:27:10] PROBLEM - Backup freshness on backup1014 is CRITICAL: All failures: 1 (install2005), Fresh: 137 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [04:28:53] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [04:29:45] RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-search-backfill is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow [04:31:03] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81953 and previous config saved to /var/cache/conftool/dbconfig/20250828-043102-ladsgroup.json [04:46:11] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81954 and previous config saved to /var/cache/conftool/dbconfig/20250828-044610-ladsgroup.json [05:01:18] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81955 and previous config saved to /var/cache/conftool/dbconfig/20250828-050117-ladsgroup.json [05:01:24] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [05:01:33] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [05:07:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [05:10:20] 06SRE, 10DNS, 06Traffic: Set mediawiki.gr, wikipedia.pt, and wiktionary.org.uk NS records to WMF - https://phabricator.wikimedia.org/T401438#11127302 (10waldyrious) @BCornwall you had emailed geral@wikimedia.pt (which is an alias to the Wikimedia-PT-internal mailing list -- wikimedia-pt-internal@lists.wikime... [05:10:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [05:12:44] RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [05:17:58] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance [05:18:06] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81956 and previous config saved to /var/cache/conftool/dbconfig/20250828-051805-ladsgroup.json [05:18:10] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [05:31:28] (03CR) 10Ayounsi: [C:03+1] Add bgp to security zone production for mr [homer/public] - 10https://gerrit.wikimedia.org/r/1182653 (https://phabricator.wikimedia.org/T294845) (owner: 10Papaul) [05:31:53] (03CR) 10Ayounsi: [C:03+1] Revert "Allow bgp for security zone production on interface facing the cr3/cr4" [homer/public] - 10https://gerrit.wikimedia.org/r/1182651 (owner: 10Papaul) [05:32:55] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:33:50] (03Restored) 10Ayounsi: MR: rollback gNMI [homer/public] - 10https://gerrit.wikimedia.org/r/1133398 (https://phabricator.wikimedia.org/T390052) (owner: 10Ayounsi) [05:36:06] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81957 and previous config saved to /var/cache/conftool/dbconfig/20250828-053605-ladsgroup.json [05:36:11] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [05:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [05:44:46] 06SRE, 06Infrastructure-Foundations, 10netops: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845#11127339 (10ayounsi) Good job!! Next step is to remove OSPF, feel free to send a patch in that regard. It can restore the previously removed `replace: ospf {}` as well as re... [05:51:13] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81958 and previous config saved to /var/cache/conftool/dbconfig/20250828-055112-ladsgroup.json [05:51:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [06:00:06] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T0600) [06:00:06] marostegui, Amir1, and federico3: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T0600). [06:06:21] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81959 and previous config saved to /var/cache/conftool/dbconfig/20250828-060620-ladsgroup.json [06:06:58] cdanis: I was aware that it existed, but that's about it :-) [06:11:44] RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [06:14:28] (03CR) 10KCVelaga: [C:03+1] Disable User Agent collection for MinT for Readers streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182621 (https://phabricator.wikimedia.org/T398057) (owner: 10KCVelaga) [06:15:01] (03CR) 10KCVelaga: [C:03+1] "@aotto@wikimedia.org can you merge this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182621 (https://phabricator.wikimedia.org/T398057) (owner: 10KCVelaga) [06:18:54] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11127364 (10phaultfinder) [06:21:28] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81960 and previous config saved to /var/cache/conftool/dbconfig/20250828-062127-ladsgroup.json [06:21:33] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [06:21:43] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance [06:21:50] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81961 and previous config saved to /var/cache/conftool/dbconfig/20250828-062149-ladsgroup.json [06:23:51] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11127370 (10phaultfinder) [06:27:10] RECOVERY - Backup freshness on backup1014 is OK: Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [06:40:08] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81962 and previous config saved to /var/cache/conftool/dbconfig/20250828-064007-ladsgroup.json [06:40:13] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [06:48:02] (03CR) 10Vgutierrez: P:puppetserver::volatile generate datacenter database (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede) [06:52:47] (03PS1) 10Stevemunene: druid-public: Add dummy keytabs for new hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1182691 (https://phabricator.wikimedia.org/T397441) [06:54:06] (03PS1) 10Mszwarc: Revert "wikimaniawiki: update logo to 2025" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182692 (https://phabricator.wikimedia.org/T403148) [06:54:26] (03CR) 10Vgutierrez: sre.loadbalancer: modify admin.py to accept 'reboot' action (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins) [06:55:15] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81963 and previous config saved to /var/cache/conftool/dbconfig/20250828-065515-ladsgroup.json [06:55:36] (03PS1) 10Muehlenhoff: conf/codfw: Remove now obsolete cert [puppet] - 10https://gerrit.wikimedia.org/r/1182693 (https://phabricator.wikimedia.org/T352245) [06:55:38] (03PS1) 10Muehlenhoff: conf/eqiad: Remove obsolete cert [puppet] - 10https://gerrit.wikimedia.org/r/1182694 (https://phabricator.wikimedia.org/T352245) [06:56:34] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182693 (https://phabricator.wikimedia.org/T352245) (owner: 10Muehlenhoff) [07:00:05] Amir1, Urbanecm, and awight: UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T0700). Please do the needful. [07:00:05] No Gerrit patches in the queue for this window AFAICS. [07:00:57] (03CR) 10Majavah: [C:03+1] "I always though this loop was there to ensure the domain is active before it's being transferred. If that's not actually required, then +1" [puppet] - 10https://gerrit.wikimedia.org/r/1182188 (https://phabricator.wikimedia.org/T398712) (owner: 10Andrew Bogott) [07:01:30] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182694 (https://phabricator.wikimedia.org/T352245) (owner: 10Muehlenhoff) [07:02:57] (03PS1) 10Muehlenhoff: Remove obsolete stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1182695 (https://phabricator.wikimedia.org/T360636) [07:03:18] (03PS2) 10Muehlenhoff: conf/eqiad: Remove obsolete cert [puppet] - 10https://gerrit.wikimedia.org/r/1182694 (https://phabricator.wikimedia.org/T352245) [07:03:34] (03PS2) 10Muehlenhoff: conf/codfw: Remove now obsolete cert [puppet] - 10https://gerrit.wikimedia.org/r/1182693 (https://phabricator.wikimedia.org/T352245) [07:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [07:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [07:05:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [07:07:02] (03CR) 10Muehlenhoff: [C:04-1] "I was about to merge this, but then realised the mwdebug* servers are still around and using that cert, so that will have to be on hold un" [puppet] - 10https://gerrit.wikimedia.org/r/1178528 (https://phabricator.wikimedia.org/T360636) (owner: 10Muehlenhoff) [07:10:23] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81964 and previous config saved to /var/cache/conftool/dbconfig/20250828-071022-ladsgroup.json [07:11:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [07:14:15] (03CR) 10Muehlenhoff: "I'm following up on https://phabricator.wikimedia.org/T177371 so that it's easier to find along with the rest of the earlier discussion" [puppet] - 10https://gerrit.wikimedia.org/r/989993 (https://phabricator.wikimedia.org/T177371) (owner: 10Muehlenhoff) [07:16:11] (03CR) 10Volans: "thanks for the update, replies inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 (owner: 10JHathaway) [07:25:30] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#11127487 (10MoritzMuehlenhoff) Support for DSA was removed in OpenSSH 10, which is the version in Debian Trixie: https://www... [07:25:31] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81965 and previous config saved to /var/cache/conftool/dbconfig/20250828-072530-ladsgroup.json [07:25:36] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [07:25:46] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance [07:25:54] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81966 and previous config saved to /var/cache/conftool/dbconfig/20250828-072553-ladsgroup.json [07:27:01] !log T271776: reindexing all lexemes in testwikidatawiki [07:27:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:05] T271776: Allow limiting lexeme searches by language - https://phabricator.wikimedia.org/T271776 [07:27:14] !log T271776: reindexing all lexemes in wikidatawiki [07:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [07:29:35] (03PS11) 10Slyngshede: P:puppetserver::volatile generate datacenter database [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) [07:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [07:35:04] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#11127495 (10taavi) Grepping the auth logs seems to think those are no longer in use: `lang=shell-session taavi@tools-bastion... [07:38:17] (03CR) 10JMeybohm: [C:03+1] "Much appreciated!" [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/1182685 (owner: 10RLazarus) [07:41:02] (03CR) 10Ilias Sarantopoulos: [C:03+1] ml-services: update revscoring staging image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182506 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [07:44:05] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81967 and previous config saved to /var/cache/conftool/dbconfig/20250828-074404-ladsgroup.json [07:44:10] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [07:45:42] (03PS1) 10Stevemunene: druid: Bring druid1012.eqiad.wmnet into service [puppet] - 10https://gerrit.wikimedia.org/r/1182698 (https://phabricator.wikimedia.org/T397441) [07:45:44] (03PS1) 10Stevemunene: druid: Bring druid1013.eqiad.wmnet into service [puppet] - 10https://gerrit.wikimedia.org/r/1182699 (https://phabricator.wikimedia.org/T397441) [07:45:45] (03PS1) 10Stevemunene: druid: Add druid druid101[2-3] to druid_public_broker VIP [puppet] - 10https://gerrit.wikimedia.org/r/1182700 (https://phabricator.wikimedia.org/T397441) [07:47:49] (03CR) 10Kevin Bazira: [C:03+2] ml-services: update revscoring staging image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182506 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [07:50:32] (03Merged) 10jenkins-bot: ml-services: update revscoring staging image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182506 (https://phabricator.wikimedia.org/T400350) (owner: 10Kevin Bazira) [07:51:07] (03CR) 10Volans: sre.loadbalancer: modify admin.py to accept 'reboot' action (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins) [07:51:49] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [07:53:11] (03PS1) 10Brouberol: mediawiki-dumps-legacy: rely on the plain php binary instead of defining the version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182745 (https://phabricator.wikimedia.org/T403110) [07:53:33] (03CR) 10Arnaudb: [C:03+2] gerrit: mod qos configuration [puppet] - 10https://gerrit.wikimedia.org/r/1182574 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb) [07:54:40] (03PS1) 10Volans: cumin: add A:trixie alias [puppet] - 10https://gerrit.wikimedia.org/r/1182750 [07:59:13] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81969 and previous config saved to /var/cache/conftool/dbconfig/20250828-075912-ladsgroup.json [07:59:36] (03PS1) 10Stevemunene: hdfs: Cleanup decommissioned workers puppet mentions [puppet] - 10https://gerrit.wikimedia.org/r/1182753 (https://phabricator.wikimedia.org/T397172) [08:00:04] andre and jnuche: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - Utc-0 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T0800). [08:01:53] (03PS2) 10Stevemunene: hdfs: Cleanup decommissioned workers puppet mentions [puppet] - 10https://gerrit.wikimedia.org/r/1182753 (https://phabricator.wikimedia.org/T397172) [08:02:19] (03PS1) 10TrainBranchBot: group2 to 1.45.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182754 (https://phabricator.wikimedia.org/T396377) [08:02:21] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by aklapper@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182754 (https://phabricator.wikimedia.org/T396377) (owner: 10TrainBranchBot) [08:03:01] (03PS1) 10Fabfur: Revert "ACMEChiefConfig: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182755 [08:03:15] (03Merged) 10jenkins-bot: group2 to 1.45.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182754 (https://phabricator.wikimedia.org/T396377) (owner: 10TrainBranchBot) [08:03:37] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1182750 (owner: 10Volans) [08:03:40] (03PS1) 10Fabfur: Revert "NCRedirRedirects: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182756 [08:03:55] (03CR) 10CI reject: [V:04-1] Revert "NCRedirRedirects: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182756 (owner: 10Fabfur) [08:03:57] (03CR) 10Fabfur: [C:03+2] Revert "ACMEChiefConfig: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182755 (owner: 10Fabfur) [08:07:18] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [08:12:07] (03CR) 10Brouberol: [C:03+2] airflow: grant permissions to get/list events in the namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182078 (owner: 10Brouberol) [08:12:08] (03PS3) 10Fabfur: Revert "NCRedirRedirects: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182756 [08:12:32] (03CR) 10Fabfur: [C:03+2] Revert "NCRedirRedirects: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182756 (owner: 10Fabfur) [08:12:38] !log aklapper@deploy1003 rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.16 refs T396377 [08:12:44] T396377: 1.45.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T396377 [08:12:53] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [08:14:17] (03Merged) 10jenkins-bot: airflow: grant permissions to get/list events in the namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182078 (owner: 10Brouberol) [08:14:21] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81970 and previous config saved to /var/cache/conftool/dbconfig/20250828-081420-ladsgroup.json [08:17:37] (03CR) 10Volans: [C:03+2] cumin: add A:trixie alias [puppet] - 10https://gerrit.wikimedia.org/r/1182750 (owner: 10Volans) [08:19:54] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [08:19:56] (03PS1) 10Fabfur: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182761 [08:20:12] (03CR) 10CI reject: [V:04-1] Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182761 (owner: 10Fabfur) [08:20:18] (03PS1) 10Arnaudb: Revert "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1182762 [08:20:41] (03PS1) 10Slyngshede: P:cache:? datacenter lookup [puppet] - 10https://gerrit.wikimedia.org/r/1182763 (https://phabricator.wikimedia.org/T398161) [08:21:31] (03PS3) 10Fabfur: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182761 [08:21:41] (03CR) 10Volans: "Ideally we should add support for homer into spicerack and call it as a library, but for now I think it's ok as a temporary workaround to " [cookbooks] - 10https://gerrit.wikimedia.org/r/1166407 (owner: 10Ayounsi) [08:22:06] (03CR) 10Fabfur: [C:03+2] Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182761 (owner: 10Fabfur) [08:23:23] (03CR) 10Brouberol: [C:03+1] druid-public: Add dummy keytabs for new hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1182691 (https://phabricator.wikimedia.org/T397441) (owner: 10Stevemunene) [08:23:41] (03CR) 10Brouberol: [C:03+1] druid: Bring druid1012.eqiad.wmnet into service [puppet] - 10https://gerrit.wikimedia.org/r/1182698 (https://phabricator.wikimedia.org/T397441) (owner: 10Stevemunene) [08:23:56] (03CR) 10Brouberol: [C:03+1] druid: Bring druid1013.eqiad.wmnet into service [puppet] - 10https://gerrit.wikimedia.org/r/1182699 (https://phabricator.wikimedia.org/T397441) (owner: 10Stevemunene) [08:24:19] (03CR) 10Brouberol: [C:03+1] druid: Add druid druid101[2-3] to druid_public_broker VIP [puppet] - 10https://gerrit.wikimedia.org/r/1182700 (https://phabricator.wikimedia.org/T397441) (owner: 10Stevemunene) [08:24:39] (03CR) 10Brouberol: [C:03+1] hdfs: Cleanup decommissioned workers puppet mentions [puppet] - 10https://gerrit.wikimedia.org/r/1182753 (https://phabricator.wikimedia.org/T397172) (owner: 10Stevemunene) [08:24:55] FIRING: [4x] KubernetesRsyslogDown: rsyslog on dse-k8s-ctrl2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [08:27:04] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [08:28:55] !log stevemunene@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster [08:29:21] !log stevemunene@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster [08:29:29] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81971 and previous config saved to /var/cache/conftool/dbconfig/20250828-082928-ladsgroup.json [08:29:33] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [08:29:44] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance [08:29:51] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81972 and previous config saved to /var/cache/conftool/dbconfig/20250828-082951-ladsgroup.json [08:29:58] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#11127650 (10MoritzMuehlenhoff) >>! In T177371#11127495, @taavi wrote: > Grepping the auth logs seems to think those are no l... [08:30:09] !log stevemunene@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-worker2002.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster [08:30:42] !log stevemunene@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-worker2001.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster [08:31:47] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [08:32:05] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81973 and previous config saved to /var/cache/conftool/dbconfig/20250828-083204-ladsgroup.json [08:34:53] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [08:36:00] (03CR) 10Jcrespo: [C:03+2] backup: Reenable notifications for doc1004 [puppet] - 10https://gerrit.wikimedia.org/r/1182576 (https://phabricator.wikimedia.org/T392130) (owner: 10Jcrespo) [08:37:53] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [08:42:10] (03PS1) 10Fabfur: install_server: add custom nvme format command for new codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) [08:42:11] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [08:42:36] (03PS1) 10Muehlenhoff: SSH key management: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1182767 [08:46:59] !log kevinbazira@deploy1003 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [08:47:00] (03PS4) 10Tiziano Fogli: mirrormaker: add alerts directly in Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182092 (https://phabricator.wikimedia.org/T370153) [08:47:12] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81974 and previous config saved to /var/cache/conftool/dbconfig/20250828-084711-ladsgroup.json [08:47:57] (03PS1) 10Clément Goubert: trafficserver: Add gateway-check to /w/rest.php [puppet] - 10https://gerrit.wikimedia.org/r/1182768 (https://phabricator.wikimedia.org/T400131) [08:47:59] (03PS1) 10Clément Goubert: trafficserver: test2wiki rest API to rest-gateway [puppet] - 10https://gerrit.wikimedia.org/r/1182769 (https://phabricator.wikimedia.org/T402412) [08:48:08] (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182768 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [08:48:10] (03CR) 10Clément Goubert: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182769 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [08:49:53] (03CR) 10Tiziano Fogli: mirrormaker: add alerts directly in Prometheus (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182092 (https://phabricator.wikimedia.org/T370153) (owner: 10Tiziano Fogli) [08:52:32] (03CR) 10Muehlenhoff: [C:03+1] "This looks good to me and makes late_command.sh much saner." [puppet] - 10https://gerrit.wikimedia.org/r/1182573 (https://phabricator.wikimedia.org/T123918) (owner: 10MVernon) [08:53:51] (03CR) 10Tiziano Fogli: [C:03+2] nrpewrapper: correlate Prometheus "for:" duration with Icinga timing [puppet] - 10https://gerrit.wikimedia.org/r/1182524 (https://phabricator.wikimedia.org/T395446) (owner: 10Tiziano Fogli) [08:54:53] (03PS9) 10Ayounsi: Nokia: Add initial Python files for nokia switch system config [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [08:54:53] (03PS12) 10Ayounsi: Nokia: module for interface configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [08:54:53] (03PS6) 10Ayounsi: Nokia: module for network-instance configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [08:54:54] (03PS5) 10Ayounsi: Nokia: module for OSPF configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1181132 (owner: 10Cathal Mooney) [08:55:11] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#11127717 (10taavi) If we're just removing the host key, then I think it's fine to merge the patch at any time. The user keys... [08:56:13] (03CR) 10CI reject: [V:04-1] Nokia: Add initial Python files for nokia switch system config [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [08:56:16] (03CR) 10CI reject: [V:04-1] Nokia: module for OSPF configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1181132 (owner: 10Cathal Mooney) [08:56:19] (03CR) 10CI reject: [V:04-1] Nokia: module for interface configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [08:56:20] (03CR) 10Ayounsi: Nokia: Add initial Python files for nokia switch system config (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [08:56:21] (03CR) 10CI reject: [V:04-1] Nokia: module for network-instance configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [09:01:32] (03PS12) 10Slyngshede: P:puppetserver::volatile generate datacenter database [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) [09:01:50] (03CR) 10Slyngshede: P:puppetserver::volatile generate datacenter database (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede) [09:02:20] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81975 and previous config saved to /var/cache/conftool/dbconfig/20250828-090219-ladsgroup.json [09:04:15] (03PS1) 10Kevin Bazira: ml-services: update revscoring production image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182770 (https://phabricator.wikimedia.org/T400350) [09:04:53] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:11:13] (03CR) 10Tiziano Fogli: "Please see my inline comment, thank you." [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [09:13:41] (03CR) 10Hnowlan: [C:03+1] api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182674 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [09:13:54] (03CR) 10Tiziano Fogli: [C:03+1] sre: add unit filter to systemd status dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/1182657 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [09:14:47] (03PS3) 10Brouberol: kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 [09:15:35] (03CR) 10Arnaudb: [C:03+2] "* tests OK on Bookworm" [puppet] - 10https://gerrit.wikimedia.org/r/1182762 (owner: 10Arnaudb) [09:16:13] (03CR) 10Hnowlan: [C:03+1] api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182675 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [09:17:10] (03CR) 10Hnowlan: [C:03+1] api-gateway: Remove deprecated Envoy config fields [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182676 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [09:17:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81977 and previous config saved to /var/cache/conftool/dbconfig/20250828-091727-ladsgroup.json [09:17:33] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [09:17:43] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance [09:17:44] (03CR) 10Hnowlan: [C:03+1] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182680 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [09:17:51] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81978 and previous config saved to /var/cache/conftool/dbconfig/20250828-091750-ladsgroup.json [09:17:53] (03CR) 10Hnowlan: [C:03+1] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [09:20:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81979 and previous config saved to /var/cache/conftool/dbconfig/20250828-092004-ladsgroup.json [09:21:36] (03CR) 10CI reject: [V:04-1] kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [09:23:23] (03PS1) 10Arnaudb: Revert^2 "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1182775 [09:23:27] (03CR) 10Majavah: [C:03+1] Stop using DSA keys also for Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/989993 (https://phabricator.wikimedia.org/T177371) (owner: 10Muehlenhoff) [09:24:49] (03CR) 10Hnowlan: [C:03+1] trafficserver: Add gateway-check to /w/rest.php [puppet] - 10https://gerrit.wikimedia.org/r/1182768 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [09:25:24] (03CR) 10Brouberol: "Is there something I need to do to get the cookbook CI to get the latest available spicerack version?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [09:25:36] (03CR) 10Hnowlan: [C:03+1] trafficserver: test2wiki rest API to rest-gateway [puppet] - 10https://gerrit.wikimedia.org/r/1182769 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [09:30:03] jouncebot: nowandnext [09:30:03] For the next 0 hour(s) and 29 minute(s): MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T0800) [09:30:03] In 0 hour(s) and 29 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1000) [09:30:48] (03PS10) 10Ayounsi: Nokia: Add initial Python files for nokia switch system config [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [09:30:48] (03PS13) 10Ayounsi: Nokia: module for interface configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [09:30:48] (03PS7) 10Ayounsi: Nokia: module for network-instance configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [09:30:49] (03PS6) 10Ayounsi: Nokia: module for OSPF configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1181132 (owner: 10Cathal Mooney) [09:31:44] (03CR) 10Urbanecm: [C:03+1] "LGTM, but note this is not the last remaining place: https://codesearch.wmcloud.org/search/?q=GENewcomerTasksTopicType&files=&excludeFiles" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [09:32:24] (03PS2) 10Urbanecm: urbanecm's dotfiles: Remove proxy vars by default [puppet] - 10https://gerrit.wikimedia.org/r/1182603 [09:32:38] (03CR) 10Elukey: install_server: add custom nvme format command for new codfw hosts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [09:33:10] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:53] (03CR) 10Ayounsi: Nokia: Add initial Python files for nokia switch system config (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [09:34:59] (03CR) 10Volans: "No, once a package is published on pypi CI will use that one." [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [09:35:12] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81980 and previous config saved to /var/cache/conftool/dbconfig/20250828-093511-ladsgroup.json [09:35:53] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [09:35:57] (03CR) 10Arnaudb: [C:03+2] urbanecm's dotfiles: Remove proxy vars by default [puppet] - 10https://gerrit.wikimedia.org/r/1182603 (owner: 10Urbanecm) [09:36:48] (03CR) 10Cathal Mooney: [C:03+1] "Nice work! I'm happy with this approach overall looks good." [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [09:38:03] (03PS1) 10David Caro: openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) [09:38:37] (03CR) 10CI reject: [V:04-1] openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [09:39:18] (03PS2) 10David Caro: openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) [09:39:20] (03CR) 10Cathal Mooney: Nokia: module for network-instance configuration (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [09:39:51] (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [09:39:54] (03CR) 10CI reject: [V:04-1] openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [09:40:37] (03PS3) 10David Caro: openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) [09:41:23] (03PS2) 10Slyngshede: P:cache:haproxy add fetch_is_datacenter lookup [puppet] - 10https://gerrit.wikimedia.org/r/1182763 (https://phabricator.wikimedia.org/T398161) [09:43:12] (03CR) 10CI reject: [V:04-1] openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [09:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [09:45:25] (03PS4) 10David Caro: openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) [09:46:32] (03CR) 10Michael Große: "Ah, I wasn't aware of the vagrant file. Maybe we should just remove that completely? It is very unmaintained I think we're phasing out vag" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [09:47:48] (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [09:49:30] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, August 28 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [09:49:31] (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/1181132 (owner: 10Cathal Mooney) [09:50:20] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81981 and previous config saved to /var/cache/conftool/dbconfig/20250828-095019-ladsgroup.json [09:51:55] (03PS1) 10Slyngshede: P:cache:haproxy add datacenter information to provenance [puppet] - 10https://gerrit.wikimedia.org/r/1182782 (https://phabricator.wikimedia.org/T398161) [09:52:33] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance [09:52:40] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81982 and previous config saved to /var/cache/conftool/dbconfig/20250828-095240-fceratto.json [09:52:45] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [09:53:27] (03PS3) 10Slyngshede: P:cache:haproxy add fetch_is_datacenter lookup [puppet] - 10https://gerrit.wikimedia.org/r/1182763 (https://phabricator.wikimedia.org/T398161) [09:54:36] (03CR) 10Brouberol: kafka: reach out to the newly introduce spicerack.kafka.admin_client method (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [09:54:42] (03PS2) 10Fabfur: install_server: add custom nvme format command for new codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) [09:54:50] (03CR) 10Fabfur: install_server: add custom nvme format command for new codfw hosts (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [09:54:53] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81983 and previous config saved to /var/cache/conftool/dbconfig/20250828-095452-fceratto.json [09:54:57] (03CR) 10Brouberol: "Aaah, , that makes sense. Sorry /facepalms" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [09:55:21] (03PS1) 10Urbanecm: [Growth] wikidata: Do not disable Help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) [09:56:02] (03PS4) 10Brouberol: kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 [09:56:14] (03CR) 10Michael Große: [C:03+1] [Growth] wikidata: Do not disable Help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) (owner: 10Urbanecm) [10:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1000) [10:00:14] (03CR) 10Urbanecm: [C:03+1] "If we want to remove GE from Vagrant entriely, that would be fine with me. My understanding is it existed mostly because prior developers " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [10:00:40] (03CR) 10Clément Goubert: [C:03+2] trafficserver: Add gateway-check to /w/rest.php [puppet] - 10https://gerrit.wikimedia.org/r/1182768 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [10:01:43] (03PS1) 10Cathal Mooney: WMF-Plugin: Include the BGP role when exposing the IGBP data [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1182796 (https://phabricator.wikimedia.org/T402577) [10:03:11] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#11127854 (10MoritzMuehlenhoff) Sounds good, I'll merge this later the day. [10:03:15] (03PS2) 10Muehlenhoff: Stop using DSA keys also for Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/989993 (https://phabricator.wikimedia.org/T177371) [10:03:23] (03PS1) 10Cathal Mooney: JunOS IBGP: adjust template to work with updated data from plugin [homer/public] - 10https://gerrit.wikimedia.org/r/1182797 (https://phabricator.wikimedia.org/T402577) [10:05:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81984 and previous config saved to /var/cache/conftool/dbconfig/20250828-100526-ladsgroup.json [10:05:33] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [10:05:42] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance [10:05:53] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:07:48] (03PS1) 10Mszwarc: Remove setting `wgEnablePartialActionBlocks`. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) [10:08:17] (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [10:08:39] (03CR) 10Brouberol: [C:03+2] kafka: reach out to the newly introduce spicerack.kafka.admin_client method [cookbooks] - 10https://gerrit.wikimedia.org/r/1174727 (owner: 10Brouberol) [10:10:00] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P81985 and previous config saved to /var/cache/conftool/dbconfig/20250828-100959-fceratto.json [10:11:01] (03PS1) 10Majavah: mariadb-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182805 [10:11:01] (03PS1) 10Majavah: tcl86-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182806 (https://phabricator.wikimedia.org/T400256) [10:11:37] (03CR) 10Clément Goubert: [C:03+2] trafficserver: test2wiki rest API to rest-gateway [puppet] - 10https://gerrit.wikimedia.org/r/1182769 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [10:16:39] FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [10:18:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [10:19:06] (03PS2) 10Arnaudb: Revert^2 "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1182775 (https://phabricator.wikimedia.org/T402611) [10:21:48] (03PS3) 10Tiziano Fogli: icinga/audit: add script to dump defined checks [software] - 10https://gerrit.wikimedia.org/r/1182571 (https://phabricator.wikimedia.org/T395443) [10:22:22] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance [10:22:30] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81986 and previous config saved to /var/cache/conftool/dbconfig/20250828-102229-ladsgroup.json [10:22:36] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [10:23:44] (03CR) 10Ayounsi: JunOS IBGP: adjust template to work with updated data from plugin (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1182797 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [10:23:53] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11127894 (10phaultfinder) [10:25:08] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P81987 and previous config saved to /var/cache/conftool/dbconfig/20250828-102507-fceratto.json [10:26:58] (03CR) 10Tiziano Fogli: "Yeah Cole, it is meant to be invoked manually on the local machine (tunneling the remote PuppetDB port). I’ve updated the help and moved t" [software] - 10https://gerrit.wikimedia.org/r/1182571 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli) [10:28:56] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11127899 (10phaultfinder) [10:37:19] (03CR) 10Volans: [C:03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [10:38:19] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81988 and previous config saved to /var/cache/conftool/dbconfig/20250828-103818-ladsgroup.json [10:38:25] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [10:40:15] (03CR) 10Volans: Nokia: module for interface configuration (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [10:40:15] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81989 and previous config saved to /var/cache/conftool/dbconfig/20250828-104014-fceratto.json [10:40:20] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [10:40:30] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance [10:40:38] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81990 and previous config saved to /var/cache/conftool/dbconfig/20250828-104037-fceratto.json [10:41:38] (03PS1) 10Clément Goubert: Revert "trafficserver: test2wiki rest API to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1182808 (https://phabricator.wikimedia.org/T402412) [10:43:03] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81991 and previous config saved to /var/cache/conftool/dbconfig/20250828-104302-fceratto.json [10:43:08] (03CR) 10Clément Goubert: [C:03+2] Revert "trafficserver: test2wiki rest API to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1182808 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [10:43:09] (03CR) 10Clément Goubert: [V:03+2 C:03+2] Revert "trafficserver: test2wiki rest API to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1182808 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [10:44:01] (03CR) 10Volans: Nokia: module for network-instance configuration (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [10:48:32] (03CR) 10Elukey: install_server: add custom nvme format command for new codfw hosts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [10:53:26] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81992 and previous config saved to /var/cache/conftool/dbconfig/20250828-105326-ladsgroup.json [10:58:10] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P81993 and previous config saved to /var/cache/conftool/dbconfig/20250828-105810-fceratto.json [11:03:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [11:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [11:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [11:08:34] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81994 and previous config saved to /var/cache/conftool/dbconfig/20250828-110833-ladsgroup.json [11:10:35] (03CR) 10Stevemunene: [V:03+2 C:03+2] druid-public: Add dummy keytabs for new hosts [labs/private] - 10https://gerrit.wikimedia.org/r/1182691 (https://phabricator.wikimedia.org/T397441) (owner: 10Stevemunene) [11:11:41] (03PS8) 10STran: Enable temporary accounts on remaining small-sized projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1180532 (https://phabricator.wikimedia.org/T402181) (owner: 10Tchanders) [11:12:02] (03CR) 10Stevemunene: [C:03+2] hdfs: Cleanup decommissioned workers puppet mentions [puppet] - 10https://gerrit.wikimedia.org/r/1182753 (https://phabricator.wikimedia.org/T397172) (owner: 10Stevemunene) [11:13:18] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P81995 and previous config saved to /var/cache/conftool/dbconfig/20250828-111317-fceratto.json [11:16:34] (03CR) 10Slyngshede: [C:03+1] "Checked that Paramiko agrees on the name, it does. We might want to add it to modules/idm/templates/idm-django-settings.erb in the Puppet " [software/bitu] - 10https://gerrit.wikimedia.org/r/1182767 (owner: 10Muehlenhoff) [11:19:25] (03PS2) 10Muehlenhoff: SSH key management: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1182767 [11:20:11] (03PS1) 10Muehlenhoff: Bitu:: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [puppet] - 10https://gerrit.wikimedia.org/r/1182812 [11:20:40] (03PS2) 10Muehlenhoff: Bitu: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [puppet] - 10https://gerrit.wikimedia.org/r/1182812 [11:20:53] (03CR) 10Muehlenhoff: "Ah, yes. Separate patch at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182812" [software/bitu] - 10https://gerrit.wikimedia.org/r/1182767 (owner: 10Muehlenhoff) [11:23:41] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81996 and previous config saved to /var/cache/conftool/dbconfig/20250828-112340-ladsgroup.json [11:23:47] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [11:23:56] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance [11:24:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P81997 and previous config saved to /var/cache/conftool/dbconfig/20250828-112403-ladsgroup.json [11:28:25] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81998 and previous config saved to /var/cache/conftool/dbconfig/20250828-112824-fceratto.json [11:28:30] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [11:28:41] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance [11:28:47] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P81999 and previous config saved to /var/cache/conftool/dbconfig/20250828-112847-fceratto.json [11:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [11:31:14] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P82000 and previous config saved to /var/cache/conftool/dbconfig/20250828-113113-fceratto.json [11:39:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [11:40:24] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P82001 and previous config saved to /var/cache/conftool/dbconfig/20250828-114023-ladsgroup.json [11:40:28] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [11:42:08] (03CR) 10Dreamy Jazz: [C:03+1] "Once wmf.16 is stably on all wikis, this LGTM. Might be best to wait until Monday to be sure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc) [11:42:45] (03CR) 10Dreamy Jazz: [C:03+1] "Actually https://versions.toolforge.org/ says that it's deployed everywhere, so maybe it's fine to merge now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc) [11:44:24] (03CR) 10Dreamy Jazz: [C:03+1] Enable temporary accounts on remaining small-sized projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1180532 (https://phabricator.wikimedia.org/T402181) (owner: 10Tchanders) [11:46:21] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82002 and previous config saved to /var/cache/conftool/dbconfig/20250828-114620-fceratto.json [11:46:28] jouncebot: next [11:46:29] In 0 hour(s) and 13 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1200) [11:47:26] (03CR) 10Ayounsi: Nokia: module for interface configuration (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:52:31] !log delete from recentchanges where rc_timestamp < '20250725000000'; on all.dblist (T403002) [11:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:36] T403002: Unpurged renameuser rows in recentchanges causing isDenseTagFilter to return true inappropriately - https://phabricator.wikimedia.org/T403002 [11:54:36] (03CR) 10Ayounsi: Nokia: module for interface configuration (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:55:31] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P82003 and previous config saved to /var/cache/conftool/dbconfig/20250828-115530-ladsgroup.json [11:55:54] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): decommission an-worker109[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T401678#11128101 (10Stevemunene) [11:56:21] (03CR) 10Mszwarc: "I originally planned to have it deployed on Monday or today evening window; depending on my availability in the today windows. But if you " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc) [11:57:55] (03CR) 10Vgutierrez: P:puppetserver::volatile generate datacenter database (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede) [11:57:58] (03CR) 10Dreamy Jazz: [C:03+1] "I'm happy to merge it later today or Monday. It would be better in case there is a train rollback." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc) [11:59:10] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): decommission an-worker109[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T401678#11128126 (10Stevemunene) Hi @VRiley-WMF , we have wrapped up the refferences deletion and this should be good to close. [11:59:13] (03PS11) 10Ayounsi: Nokia: Add initial Python files for nokia switch system config [homer/public] - 10https://gerrit.wikimedia.org/r/1180562 (https://phabricator.wikimedia.org/T402511) (owner: 10Cathal Mooney) [11:59:15] (03PS14) 10Ayounsi: Nokia: module for interface configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:59:19] (03PS8) 10Ayounsi: Nokia: module for network-instance configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1180979 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [11:59:23] (03PS7) 10Ayounsi: Nokia: module for OSPF configuration [homer/public] - 10https://gerrit.wikimedia.org/r/1181132 (owner: 10Cathal Mooney) [12:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1200) [12:01:27] (03PS1) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [12:01:28] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82005 and previous config saved to /var/cache/conftool/dbconfig/20250828-120128-fceratto.json [12:04:16] !log upgrading debmonitor to Envoy 1.26.8 T402584 [12:04:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:21] T402584: Upgrade Envoy to v1.26.8 and drop buster - https://phabricator.wikimedia.org/T402584 [12:04:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [12:06:02] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, September 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1180532 (https://phabricator.wikimedia.org/T402181) (owner: 10Tchanders) [12:07:54] MatmaRex: clarification question about https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1300 – is this still just the dry runs for now? [12:08:22] Lucas_WMDE: yeah [12:08:25] ok [12:08:33] and thanks :) [12:08:59] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] Enable the CampaignEvents extension on all the remaining Wikisources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182655 (https://phabricator.wikimedia.org/T402329) (owner: 10Daimona Eaytoy) [12:09:45] (03CR) 10Elukey: [C:03+2] profile::base: add an option to install linux 6.12 on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182579 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey) [12:10:38] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P82006 and previous config saved to /var/cache/conftool/dbconfig/20250828-121038-ladsgroup.json [12:13:20] (03CR) 10Elukey: [C:03+2] Deploy linux 6.12 from bookworm-backports on ml-serve101[2,3] [puppet] - 10https://gerrit.wikimedia.org/r/1182582 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey) [12:14:15] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS bookworm [12:14:24] (03CR) 10Muehlenhoff: [C:03+2] SSH key management: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [software/bitu] - 10https://gerrit.wikimedia.org/r/1182767 (owner: 10Muehlenhoff) [12:16:35] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P82007 and previous config saved to /var/cache/conftool/dbconfig/20250828-121634-fceratto.json [12:16:39] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance [12:16:40] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [12:17:13] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance [12:17:20] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2204 (T401906)', diff saved to https://phabricator.wikimedia.org/P82008 and previous config saved to /var/cache/conftool/dbconfig/20250828-121720-fceratto.json [12:18:45] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2204 (T401906)', diff saved to https://phabricator.wikimedia.org/P82009 and previous config saved to /var/cache/conftool/dbconfig/20250828-121844-fceratto.json [12:18:55] (03PS1) 10Muehlenhoff: Deploy linux 6.12 from bookworm-backports on DSE k8s workers [puppet] - 10https://gerrit.wikimedia.org/r/1182817 (https://phabricator.wikimedia.org/T394459) [12:19:20] (03CR) 10Muehlenhoff: [C:03+2] Bitu: Allow sk-ecdsa-sha2-nistp256@openssh.com keys [puppet] - 10https://gerrit.wikimedia.org/r/1182812 (owner: 10Muehlenhoff) [12:20:43] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182817 (https://phabricator.wikimedia.org/T394459) (owner: 10Muehlenhoff) [12:20:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [12:21:27] (03CR) 10Volans: "reply inline" [homer/public] - 10https://gerrit.wikimedia.org/r/1180925 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [12:25:46] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P82010 and previous config saved to /var/cache/conftool/dbconfig/20250828-122545-ladsgroup.json [12:25:52] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [12:25:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [12:26:01] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance [12:26:09] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82011 and previous config saved to /var/cache/conftool/dbconfig/20250828-122608-ladsgroup.json [12:30:19] (03CR) 10Vgutierrez: multi-dc: Dynamic rewrite to -ro destinations (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [12:30:35] (03CR) 10Arnaudb: [C:03+2] Revert^2 "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1182775 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb) [12:31:35] (03PS1) 10Kosta Harlan: hCaptcha: Enable processing of the risk score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182824 (https://phabricator.wikimedia.org/T379179) [12:31:40] (03PS1) 10Arnaudb: Revert^3 "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1182825 [12:32:34] jouncebot: nowandnext [12:32:34] For the next 0 hour(s) and 27 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1200) [12:32:34] In 0 hour(s) and 27 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1300) [12:33:30] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182824 (https://phabricator.wikimedia.org/T379179) (owner: 10Kosta Harlan) [12:33:31] 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11128267 (10Jclark-ctr) Resolving this ticket since we will be replacing Nic with one that matches existing servers [12:33:38] 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops: KernelErrors Server cloudcephosd1052 logged kernel errors - https://phabricator.wikimedia.org/T402938#11128268 (10Jclark-ctr) 05Open→03Resolved [12:34:45] (03Merged) 10jenkins-bot: hCaptcha: Enable processing of the risk score [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182824 (https://phabricator.wikimedia.org/T379179) (owner: 10Kosta Harlan) [12:35:09] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1182824|hCaptcha: Enable processing of the risk score (T379179)]] [12:35:15] T379179: Send hCaptcha API response data to event platform - https://phabricator.wikimedia.org/T379179 [12:37:29] !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage [12:41:48] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1182824|hCaptcha: Enable processing of the risk score (T379179)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [12:41:53] T379179: Send hCaptcha API response data to event platform - https://phabricator.wikimedia.org/T379179 [12:42:17] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82012 and previous config saved to /var/cache/conftool/dbconfig/20250828-124216-ladsgroup.json [12:42:22] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [12:42:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [12:43:30] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage [12:44:01] !log kharlan@deploy1003 kharlan: Continuing with sync [12:47:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [12:49:06] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm [12:49:22] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182824|hCaptcha: Enable processing of the risk score (T379179)]] (duration: 14m 13s) [12:49:27] T379179: Send hCaptcha API response data to event platform - https://phabricator.wikimedia.org/T379179 [12:51:19] PROBLEM - Host sretest2009 is DOWN: PING CRITICAL - Packet loss = 100% [12:51:37] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm [12:54:13] RECOVERY - Host sretest2009 is UP: PING OK - Packet loss = 0%, RTA = 30.79 ms [12:55:29] (03CR) 10Dbrant: [C:03+2] wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182187 (owner: 10PipelineBot) [12:56:25] (03PS1) 10Brouberol: dse-k8s-worker: ensure that a backported 6.12 kernel is installed [puppet] - 10https://gerrit.wikimedia.org/r/1182832 (https://phabricator.wikimedia.org/T394897) [12:57:17] (03CR) 10Brouberol: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6783/co" [puppet] - 10https://gerrit.wikimedia.org/r/1182832 (https://phabricator.wikimedia.org/T394897) (owner: 10Brouberol) [12:57:24] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P82013 and previous config saved to /var/cache/conftool/dbconfig/20250828-125724-ladsgroup.json [12:57:27] (03Merged) 10jenkins-bot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182187 (owner: 10PipelineBot) [12:58:25] !log dbrant@deploy1003 helmfile [staging] START helmfile.d/services/wikifeeds: apply [12:58:43] !log dbrant@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifeeds: apply [12:59:01] !log dbrant@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifeeds: apply [12:59:29] !log dbrant@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply [13:00:04] !log dbrant@deploy1003 helmfile [codfw] START helmfile.d/services/wikifeeds: apply [13:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1300). [13:00:05] Daimona, MatmaRex, MichaelG_WMF, and anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:12] o/ [13:00:16] o/ [13:00:31] !log dbrant@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply [13:01:03] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm [13:02:17] i'm around if needed, but i don't think i have anything to do. thanks for running the scripts Lucas_WMDE [13:02:35] o/ if there's a bit of time left after all the deploys I might try to deploy something, if this is OK [13:02:35] (i had the patches backported last night btw) [13:02:48] I’m looking at anzx’s patches for now [13:03:23] anzx: are the changes to the png files really necessary? AFAICT they look basically identical before and after [13:04:11] (regarding https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1182540, that is) [13:04:22] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2225.codfw.wmnet with reason: Maintenance [13:04:29] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82014 and previous config saved to /var/cache/conftool/dbconfig/20250828-130429-fceratto.json [13:04:35] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [13:05:24] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182530 (https://phabricator.wikimedia.org/T402706) (owner: 10Anzx) [13:05:24] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182523 (https://phabricator.wikimedia.org/T402725) (owner: 10Anzx) [13:06:13] (03Merged) 10jenkins-bot: gotwiki: update wordmark and add tagline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182530 (https://phabricator.wikimedia.org/T402706) (owner: 10Anzx) [13:06:22] (03Merged) 10jenkins-bot: tlwiktionary: set sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182523 (https://phabricator.wikimedia.org/T402725) (owner: 10Anzx) [13:06:31] Lucas_WMDE: i think it was generated by default so i didn't try removing, should i delete it [13:06:31] (03CR) 10Lucas Werkmeister (WMDE): [C:04-1] "The changes to the (non-temp) `hawiki*.png` files seem unrelated (they weren’t touched in the two changes mentioned in the commit message)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:06:37] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1182530|gotwiki: update wordmark and add tagline (T402706)]], [[gerrit:1182523|tlwiktionary: set sitename and projectnamespace (T402725)]] [13:06:39] (03CR) 10Bking: [C:03+1] dse-k8s-worker: ensure that a backported 6.12 kernel is installed [puppet] - 10https://gerrit.wikimedia.org/r/1182832 (https://phabricator.wikimedia.org/T394897) (owner: 10Brouberol) [13:06:44] T402706: Change of tagline and wordmark of Gothic Wikipedia - https://phabricator.wikimedia.org/T402706 [13:06:44] T402725: Rename the Tagalog Wiktionary from "Wiktionary" to "Wiksiyonaryo" - https://phabricator.wikimedia.org/T402725 [13:06:47] I just left a comment on the Gerrit change as well [13:06:55] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82015 and previous config saved to /var/cache/conftool/dbconfig/20250828-130654-fceratto.json [13:06:59] how was it generated? I’m curious if I can try it locally as well [13:08:40] !log restart swift on ms-fe2011 T360913 [13:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:46] T360913: Swift proxy server misbehaviour (no longer calling `accept`?) - https://phabricator.wikimedia.org/T360913 [13:09:03] trying `python logos/manage.py update hawiki` now [13:09:38] tox -e logos -- update {wikiname} [13:09:46] ok, can confirm that it changes the PNGs [13:09:54] so I guess they must’ve been modified on Commons in the meantime? [13:09:56] (03PS4) 10Anzx: hawiki: revert temporary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) [13:10:07] but I also get different results – for me all three files get slightly bigger o_O [13:10:22] maybe I have different pngquant/zopflipng versions [13:10:48] (03CR) 10Anzx: "done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:10:58] !log restart swift on ms-fe2012 T360913 [13:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:58] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Thanks. (To summarize a bit from [#wikimedia-operations](https://wm-bot.wmcloud.org/browser/index.php?start=08%2F28%2F2025&end=08%2F28%2F2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:11:58] Hey wikibugs, you are welcome! [13:12:11] wat [13:12:23] lol [13:12:31] so does wm-bot respond to any message that contains both “thanks” and its name? [13:12:31] Hey Lucas_WMDE, you are welcome! [13:12:32] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P82016 and previous config saved to /var/cache/conftool/dbconfig/20250828-131231-ladsgroup.json [13:12:35] … [13:12:38] !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage [13:12:47] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, anzx: Backport for [[gerrit:1182530|gotwiki: update wordmark and add tagline (T402706)]], [[gerrit:1182523|tlwiktionary: set sitename and projectnamespace (T402725)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:12:52] checking [13:12:53] T402706: Change of tagline and wordmark of Gothic Wikipedia - https://phabricator.wikimedia.org/T402706 [13:12:53] T402725: Rename the Tagalog Wiktionary from "Wiktionary" to "Wiksiyonaryo" - https://phabricator.wikimedia.org/T402725 [13:15:08] (03PS1) 10Vgutierrez: haproxy: JA3N fingerprinting [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) [13:15:10] Lucas_WMDE: both changes looks ok [13:15:10] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Automate PDU Deployment Process - https://phabricator.wikimedia.org/T403173 (10Jclark-ctr) 03NEW [13:15:27] alright, thanks! [13:15:29] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, anzx: Continuing with sync [13:15:52] Lucas_WMDE: wm-bot like thanks indeed: https://github.com/benapetr/wikimedia-bot/blob/88a5199bc55c10fc165de55510c3c278e3e2dc34/src/WMBot.Plugins/Thanks/Thanks.cs#L109-L120 [13:15:52] Hey hashar, you are welcome! [13:16:10] s/like/likes/ [13:16:21] (03PS1) 10Muehlenhoff: Add repository sync definition for node 22 [puppet] - 10https://gerrit.wikimedia.org/r/1182835 (https://phabricator.wikimedia.org/T393437) [13:16:25] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [13:16:27] oh wow, C♯ [13:16:31] :D [13:16:33] I am off [13:16:56] (03CR) 10Vgutierrez: "cp6001.yaml is only there for PCC reviewing purposes" [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [13:17:02] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage [13:19:57] (03CR) 10Filippo Giunchedi: [C:03+1] openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [13:20:44] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182530|gotwiki: update wordmark and add tagline (T402706)]], [[gerrit:1182523|tlwiktionary: set sitename and projectnamespace (T402725)]] (duration: 14m 06s) [13:20:51] T402706: Change of tagline and wordmark of Gothic Wikipedia - https://phabricator.wikimedia.org/T402706 [13:20:51] T402725: Rename the Tagalog Wiktionary from "Wiktionary" to "Wiksiyonaryo" - https://phabricator.wikimedia.org/T402725 [13:21:37] Lucas_WMDE: please run namespacedupes for tlwiktionary [13:21:41] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20220310000000 --until=20230101000000 # T313900 (dry run) [13:21:46] T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900 [13:21:52] yup [13:22:02] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P82017 and previous config saved to /var/cache/conftool/dbconfig/20250828-132202-fceratto.json [13:22:11] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: namespaceDupes tlwiktionary --fix # T402725 [13:23:04] right, let’s continue with the hawiki change [13:23:11] they’ve had their temporary logo for long enough ;) [13:23:37] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:24:25] MatmaRex: I like the fancier output of the maintenance script :D [13:24:28] !log updating haproxykafka to version 0.3.15 on cp3066 (test host) (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/99) (T403174) [13:24:29] (03Merged) 10jenkins-bot: hawiki: revert temporary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182540 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:24:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:33] T403174: Missing error field in DLQ messages - https://phabricator.wikimedia.org/T403174 [13:24:34] “from 12 to 6 (-6; 0.50x)” [13:24:36] etc. [13:24:45] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1182540|hawiki: revert temporary logo (T376049)]] [13:24:50] T376049: Requesting temporary logo change for ha.wiki - https://phabricator.wikimedia.org/T376049 [13:25:45] Lucas_WMDE: thanks :D i thought of that after trying to review the results from the previous runs [13:26:48] (03PS1) 10Jforrester: Provide Node 22 image, using thirdparty/node22 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1182837 (https://phabricator.wikimedia.org/T393437) [13:27:39] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82018 and previous config saved to /var/cache/conftool/dbconfig/20250828-132738-ladsgroup.json [13:27:45] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [13:28:29] Daimona_: are you around for your config change btw? [13:28:34] (03CR) 10Andrew Bogott: [C:03+2] "Me too! But I tested, and the domain can still transfer when it's in 'pending' state." [puppet] - 10https://gerrit.wikimedia.org/r/1182188 (https://phabricator.wikimedia.org/T398712) (owner: 10Andrew Bogott) [13:28:41] (03CR) 10Jforrester: "Won't work until I8afb0565cfdf1debe55daceac84d1bf59827d87d is complete." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1182837 (https://phabricator.wikimedia.org/T393437) (owner: 10Jforrester) [13:28:44] (03CR) 10Andrew Bogott: [C:03+2] wmfkeystonehooks: format with Black [puppet] - 10https://gerrit.wikimedia.org/r/1182189 (owner: 10Andrew Bogott) [13:28:49] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, anzx: Backport for [[gerrit:1182540|hawiki: revert temporary logo (T376049)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:28:50] (03CR) 10Andrew Bogott: [C:03+2] designatemakedomain.py: format with Black [puppet] - 10https://gerrit.wikimedia.org/r/1182190 (owner: 10Andrew Bogott) [13:29:14] Lucas_WMDE: good to sync [13:29:45] !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, anzx: Continuing with sync [13:29:47] alright, thanks! [13:30:04] (03PS2) 10Andrew Bogott: Openstack: add wmcs-projectcleanup.py [puppet] - 10https://gerrit.wikimedia.org/r/1182648 (https://phabricator.wikimedia.org/T397648) [13:30:13] (03PS2) 10Andrew Bogott: Openstack wmfkeystonehooks: don't clean up after project delete [puppet] - 10https://gerrit.wikimedia.org/r/1182649 (https://phabricator.wikimedia.org/T397648) [13:31:13] (03Abandoned) 10Andrew Bogott: Increase contrack table size on cloudcephosds and mons [puppet] - 10https://gerrit.wikimedia.org/r/1182175 (https://phabricator.wikimedia.org/T402480) (owner: 10Andrew Bogott) [13:31:32] (03CR) 10Andrew Bogott: [C:03+1] Stop using DSA keys also for Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/989993 (https://phabricator.wikimedia.org/T177371) (owner: 10Muehlenhoff) [13:32:30] (03PS3) 10Anzx: hawiki: remove temporary logo files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182541 (https://phabricator.wikimedia.org/T376049) [13:33:07] oh! that was fast [13:33:10] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:33:11] Dry run done, would correct 8765 edit counts [13:33:13] cc MatmaRex [13:33:46] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20230101000000 --until=20240101000000 # T313900 (dry run) [13:33:47] mhm [13:33:53] T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900 [13:34:16] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Okay to deploy ca. one week from now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182541 (https://phabricator.wikimedia.org/T376049) (owner: 10Anzx) [13:34:17] (03CR) 10Filippo Giunchedi: [C:03+2] wmcs: add JobUnavailable alert [alerts] - 10https://gerrit.wikimedia.org/r/1182508 (https://phabricator.wikimedia.org/T402778) (owner: 10Filippo Giunchedi) [13:34:25] !log updating haproxykafka to version 0.3.15 on A:cp (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/99) (T403174) [13:34:29] (03PS2) 10Filippo Giunchedi: wmcs: add JobUnavailable alert [alerts] - 10https://gerrit.wikimedia.org/r/1182508 (https://phabricator.wikimedia.org/T402778) [13:34:29] Lucas_WMDE: how's window? can you let me know when it's ok to deploy? [13:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:31] T403174: Missing error field in DLQ messages - https://phabricator.wikimedia.org/T403174 [13:34:32] (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] wmcs: add JobUnavailable alert [alerts] - 10https://gerrit.wikimedia.org/r/1182508 (https://phabricator.wikimedia.org/T402778) (owner: 10Filippo Giunchedi) [13:34:36] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2009.codfw.wmnet with OS bookworm [13:34:48] urbanecm: there’s still some config changes queued up [13:34:51] ack [13:34:56] (03CR) 10Muehlenhoff: [C:03+2] Stop using DSA keys also for Cloud VPS [puppet] - 10https://gerrit.wikimedia.org/r/989993 (https://phabricator.wikimedia.org/T177371) (owner: 10Muehlenhoff) [13:34:57] (and I’m running maintenance scripts on the side but I don’t think that conflicts with anything) [13:35:02] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182540|hawiki: revert temporary logo (T376049)]] (duration: 10m 17s) [13:35:07] T376049: Requesting temporary logo change for ha.wiki - https://phabricator.wikimedia.org/T376049 [13:35:09] if Daimona_ isn’t here then MichaelG_WMF is up next [13:37:10] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P82019 and previous config saved to /var/cache/conftool/dbconfig/20250828-133709-fceratto.json [13:37:11] (and if MichaelG_WMF isn’t here either I’ll deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1182578 anyway because it looks like a very trivial cleanup – just doing a `grep -rlF NewcomerTasksTopicType` in /srv/mediawiki-staging first to double-check that it’s no longer mentioned anywhere) [13:37:39] (rg would be faster but I don’t trust its auto-exclusion logic) [13:38:36] (03PS1) 10Majavah: openstack: Enforce black formatting for some files [puppet] - 10https://gerrit.wikimedia.org/r/1182838 [13:38:47] (03PS2) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [13:39:06] Whoooops sorry folks I was having lunch and forgot about the deployment. I'm here now if it isn't too late. [13:39:56] ok, then I’ll deploy that together with the GX cleanup [13:39:58] (03CR) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [13:40:07] Lucas_WMDE: i'm happy to step in for Michael if needed [13:40:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [13:40:13] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182655 (https://phabricator.wikimedia.org/T402329) (owner: 10Daimona Eaytoy) [13:40:20] urbanecm: sure, if there’s anything to test for that change ^^ [13:40:30] for this particular, not really [13:40:34] (the grep finished now and says the variable isn’t used anywhere except in wmf-config/ext-GrowthExperiments.php where the change removes it) [13:41:02] and wrt Daimona’s change, arguably I should’ve asked “why not use the wikisource dblist” in https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1182526 I guess ^^ [13:41:09] but no harm done [13:41:29] (03PS1) 10Majavah: openstack: wmfkeystonehooks: Clean some black artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1182839 [13:41:41] (03Merged) 10jenkins-bot: GrowthExperiments: remove unused wgGENewcomerTasksTopicType [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182578 (owner: 10Michael Große) [13:41:58] (03Merged) 10jenkins-bot: Enable the CampaignEvents extension on all the remaining Wikisources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182655 (https://phabricator.wikimedia.org/T402329) (owner: 10Daimona Eaytoy) [13:41:59] Yeah that was an intentional choice because of wanting to exclude closed wikisources. But the extension is already enabled on other closed wikis, so it doesn't really matter. [13:42:12] ah ok [13:42:13] !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1182578|GrowthExperiments: remove unused wgGENewcomerTasksTopicType]], [[gerrit:1182655|Enable the CampaignEvents extension on all the remaining Wikisources (T402329)]] [13:42:19] T402329: Release CampaignEvents extension to all Wikisources - week of AUGUST 25 - https://phabricator.wikimedia.org/T402329 [13:42:26] And obviously just using the dblist is much better so we decided to do this now [13:43:01] you could try adding 'closed' => false I guess, if you really wanted that [13:43:06] not sure what the precedence is in that case [13:43:16] (also, ew, apparently my IRC client uses a font that auto-ligaturizes => into ⇒) [13:43:36] (oh god gnome’s adwaita sans does that by default??) [13:43:53] Exactly. I wasn't sure if that would work so I just went "nevermind" [13:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:45:25] !log upgrading puppetboard to Envoy 1.26.8 T402584 [13:45:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:30] T402584: Upgrade Envoy to v1.26.8 and drop buster - https://phabricator.wikimedia.org/T402584 [13:46:28] Lucas_WMDE: re precedence: when using dblists, one needs to ensure each wiki only appears in one of them (at most) [13:46:40] ok [13:46:45] does it throw an error otherwise? [13:46:55] no, it just uses whichever dblist it came across [13:46:56] or is it just unpredictable if you fail to ensure it [13:46:56] ok [13:48:07] !log lucaswerkmeister-wmde@deploy1003 daimona, lucaswerkmeister-wmde, migr: Backport for [[gerrit:1182578|GrowthExperiments: remove unused wgGENewcomerTasksTopicType]], [[gerrit:1182655|Enable the CampaignEvents extension on all the remaining Wikisources (T402329)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:48:13] T402329: Release CampaignEvents extension to all Wikisources - week of AUGUST 25 - https://phabricator.wikimedia.org/T402329 [13:48:19] Daimona: please test :) [13:48:24] (and urbanecm if you want to test anything) [13:48:31] not really [13:49:35] Seems working, thanks! [13:49:46] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11128565 (10ssingh) Hi folks: Picking this ticket up as part of Traffic cleaning up our stuff. It seems the current static routes -- even if they matter... [13:49:49] !log lucaswerkmeister-wmde@deploy1003 daimona, lucaswerkmeister-wmde, migr: Continuing with sync [13:49:50] \o/ [13:50:42] 06SRE: provide envoyproxy package on trixie - https://phabricator.wikimedia.org/T402668#11128567 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert According to https://apt-browser.toolforge.org/trixie-wikimedia/main/ and ` cgoubert@people1005:~$ cat /etc/debian_version 13.0 cgoubert@people10... [13:50:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:51:48] (03CR) 10Ssingh: install_server: add custom nvme format command for new codfw hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [13:52:18] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82020 and previous config saved to /var/cache/conftool/dbconfig/20250828-135217-fceratto.json [13:52:23] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [13:52:24] (yay, adding `body { font-variant-ligatures: no-contextual; }` CSS fixes the appearance of => in my IRC client) [13:52:33] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2226.codfw.wmnet with reason: Maintenance [13:52:40] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82021 and previous config saved to /var/cache/conftool/dbconfig/20250828-135240-fceratto.json [13:54:14] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20240101000000 --until=20250101000000 # T313900 (dry run) [13:54:20] T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900 [13:54:36] (03CR) 10Ssingh: "Yeah, PCC looks good now! Let's remove the hiera overrides for doh1001 and dns1004 and I will +1 to merge." [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [13:55:06] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82022 and previous config saved to /var/cache/conftool/dbconfig/20250828-135505-fceratto.json [13:55:06] !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182578|GrowthExperiments: remove unused wgGENewcomerTasksTopicType]], [[gerrit:1182655|Enable the CampaignEvents extension on all the remaining Wikisources (T402329)]] (duration: 12m 53s) [13:55:16] T402329: Release CampaignEvents extension to all Wikisources - week of AUGUST 25 - https://phabricator.wikimedia.org/T402329 [13:55:37] (03PS1) 10Andrew Bogott: Remove refs to cloudcephosd1004-1015 [puppet] - 10https://gerrit.wikimedia.org/r/1182841 (https://phabricator.wikimedia.org/T402881) [13:56:01] (03PS1) 10Filippo Giunchedi: prometheus: absent ebpf_exporter scraping in cloud [puppet] - 10https://gerrit.wikimedia.org/r/1182842 (https://phabricator.wikimedia.org/T348643) [13:56:52] urbanecm: I think I’m done deploying Gerrit changes now [13:57:00] ty! [13:57:01] I still need to run the maintenance script for Daimona [13:57:05] (and for MatmaRex) [13:57:08] I can do that [13:57:13] ok [13:57:25] (03PS2) 10Urbanecm: [Growth] wikidata: Do not disable Help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) [13:57:32] (03CR) 10Urbanecm: [C:03+2] [Growth] wikidata: Do not disable Help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) (owner: 10Urbanecm) [13:57:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) (owner: 10Urbanecm) [13:58:32] (03Merged) 10jenkins-bot: [Growth] wikidata: Do not disable Help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182786 (https://phabricator.wikimedia.org/T400937) (owner: 10Urbanecm) [13:58:41] (03PS3) 10Fabfur: install_server: add custom nvme format command for new codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) [13:58:45] !log urbanecm@deploy1003 Started scap sync-world: Backport for [[gerrit:1182786|[Growth] wikidata: Do not disable Help panel (T400937)]] [13:58:50] T400937: Investigate Feasibility of Enabling Growth Features on Wikidata - https://phabricator.wikimedia.org/T400937 [13:58:55] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11128591 (10cmooney) >>! In T300877#11128565, @ssingh wrote: > It seems the current static routes -- even if they matter at this stage -- are incorrect s... [13:59:42] (03CR) 10Fabfur: install_server: add custom nvme format command for new codfw hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [14:00:58] (03CR) 10Vgutierrez: multi-dc: Dynamic rewrite to -ro destinations (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [14:01:23] !log mwscript-k8s --comment="T402239" -f -- CampaignEvents:UpdateCountriesColumn --wiki testwiki [14:01:24] (03CR) 10Elukey: [C:03+1] install_server: add custom nvme format command for new codfw hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [14:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:29] T402239: RuntimeException: Event should have only one address. - https://phabricator.wikimedia.org/T402239 [14:02:35] hmm... stuck on `https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/118278` [14:02:41] `14:01:10 K8s deployment progress: 91% (ok: 11; fail: 0; left: 1)` [14:02:52] !log urbanecm@deploy1003 urbanecm: Backport for [[gerrit:1182786|[Growth] wikidata: Do not disable Help panel (T400937)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:02:56] ha, finally [14:03:10] !log urbanecm@deploy1003 urbanecm: Continuing with sync [14:03:28] Daimona: you can use the `--sal` flag btw if you like :) [14:03:33] (mentioning just in case you’re not aware) [14:03:35] (03PS3) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [14:03:40] (03CR) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [14:03:43] Oh thank you, I was not aware! I'm gonna try it now [14:04:39] !log daimona@deploy1003 mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki test2wiki --nowarn # T402239 [14:04:52] MAGIC [14:05:10] fancy [14:05:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [14:05:56] (03CR) 10Andrew Bogott: [C:03+1] "weird" [puppet] - 10https://gerrit.wikimedia.org/r/1182839 (owner: 10Majavah) [14:06:06] ah damnit, I completely forgot about my change for this window. Thank you @Lucas_WMDE and @urbanecm for taking care of it! 🙇 [14:06:20] !log daimona@deploy1003 mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki officewiki --nowarn # T402239 [14:06:33] (03CR) 10Muehlenhoff: [C:03+2] Remove my old Neo-based key [puppet] - 10https://gerrit.wikimedia.org/r/1182498 (owner: 10Muehlenhoff) [14:06:56] (03CR) 10Andrew Bogott: [C:03+1] openstack: Enforce black formatting for some files [puppet] - 10https://gerrit.wikimedia.org/r/1182838 (owner: 10Majavah) [14:07:04] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11128621 (10ayounsi) +1 to remove them :) [14:07:32] (03CR) 10Majavah: [C:03+2] openstack: Enforce black formatting for some files [puppet] - 10https://gerrit.wikimedia.org/r/1182838 (owner: 10Majavah) [14:07:39] (03CR) 10Majavah: [C:03+2] openstack: wmfkeystonehooks: Clean some black artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1182839 (owner: 10Majavah) [14:07:43] (03CR) 10David Caro: [C:03+1] "We can disable for now yes, we might want ot re-onable at some point" [puppet] - 10https://gerrit.wikimedia.org/r/1182842 (https://phabricator.wikimedia.org/T348643) (owner: 10Filippo Giunchedi) [14:07:57] (03CR) 10Vgutierrez: [C:03+1] multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [14:08:21] (03PS2) 10Majavah: openstack: wmfkeystonehooks: Clean some black artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1182839 [14:08:28] (03CR) 10Ssingh: [C:03+1] install_server: add custom nvme format command for new codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [14:08:28] I'm almost done, now verifying the output before the last run of the script. [14:08:34] !log urbanecm@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182786|[Growth] wikidata: Do not disable Help panel (T400937)]] (duration: 09m 48s) [14:08:39] T400937: Investigate Feasibility of Enabling Growth Features on Wikidata - https://phabricator.wikimedia.org/T400937 [14:09:02] I think when I wrote this crap I badly wanted the prize for "least comprehensible debug output of the year" [14:09:07] (03CR) 10Majavah: [C:03+2] openstack: wmfkeystonehooks: Clean some black artifacts [puppet] - 10https://gerrit.wikimedia.org/r/1182839 (owner: 10Majavah) [14:09:23] Almost as bad as MySQL errors [14:10:13] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82027 and previous config saved to /var/cache/conftool/dbconfig/20250828-141012-fceratto.json [14:10:33] !log andrew@cumin2002 START - Cookbook sre.hosts.decommission for hosts cloudcephosd1004.eqiad.wmnet [14:11:25] (03CR) 10Herron: [C:03+1] icinga/audit: add script to dump defined checks [software] - 10https://gerrit.wikimedia.org/r/1182571 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli) [14:12:09] (03CR) 10Herron: [C:03+1] sre: add unit filter to systemd status dashboard link [alerts] - 10https://gerrit.wikimedia.org/r/1182657 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [14:13:18] (03CR) 10Fabfur: [C:03+2] install_server: add custom nvme format command for new codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/1182766 (https://phabricator.wikimedia.org/T392851) (owner: 10Fabfur) [14:13:35] (03CR) 10Scott French: [C:03+1] "Thanks for linking the docs!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [14:14:05] !log daimona@deploy1003 mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki metawiki --nowarn # T402239 [14:14:10] T402239: RuntimeException: Event should have only one address. - https://phabricator.wikimedia.org/T402239 [14:14:22] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20250101000000 # T313900 (dry run) [14:14:25] (03PS4) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [14:14:27] T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900 [14:15:04] (03CR) 10Scott French: [C:03+1] "Thank you very much!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182745 (https://phabricator.wikimedia.org/T403110) (owner: 10Brouberol) [14:15:29] (03CR) 10Brouberol: [C:03+2] mediawiki-dumps-legacy: rely on the plain php binary instead of defining the version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182745 (https://phabricator.wikimedia.org/T403110) (owner: 10Brouberol) [14:15:34] Daimona: Brian Kernighan failed to foresee that a sufficiently clever programmer can actively make debugging *more* than twice as hard as writing the program in the first place /s [14:16:04] (03CR) 10Vgutierrez: multi-dc: Dynamic rewrite to -ro destinations (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [14:16:54] FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [14:17:05] !log andrew@cumin2002 START - Cookbook sre.dns.netbox [14:18:29] (03CR) 10Filippo Giunchedi: [C:03+2] prometheus: absent ebpf_exporter scraping in cloud [puppet] - 10https://gerrit.wikimedia.org/r/1182842 (https://phabricator.wikimedia.org/T348643) (owner: 10Filippo Giunchedi) [14:18:37] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply [14:18:57] (03PS1) 10Elukey: profile::base: add linux-base when setting use_linux612_on_bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1182845 [14:19:38] Oh well [14:20:05] The cherry on top is that I made the script write stuff even with --dry-run. At least the changes are correct, and I am done. [14:20:10] !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply [14:20:42] !log elukey@puppetserver1001 conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad [14:21:25] !log andrew@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [14:21:53] (03CR) 10Scott French: [C:03+1] "Thanks, Moritz!" [puppet] - 10https://gerrit.wikimedia.org/r/1182693 (https://phabricator.wikimedia.org/T352245) (owner: 10Muehlenhoff) [14:22:03] !log andrew@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [14:22:03] !log andrew@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:22:06] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1004.eqiad.wmnet [14:22:32] !log install python3.9-dbg for temporary debugging ms-fe2010 T360913 [14:22:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:38] T360913: Swift proxy server misbehaviour (no longer calling `accept`?) - https://phabricator.wikimedia.org/T360913 [14:22:48] !log mforns@deploy1003 Started deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] [14:23:15] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1182845 (owner: 10Elukey) [14:23:21] Daimona: /o\ [14:23:27] (03CR) 10Elukey: [C:03+2] profile::base: add linux-base when setting use_linux612_on_bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1182845 (owner: 10Elukey) [14:23:34] feels like we should have stronger --dry-run support in the maintenance infrastructure in general tbh [14:23:46] it’s bad enough that we have a mix of scripts that dry-run by default and ones where it’s a flag [14:23:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [14:24:16] I wouldn’t mind a shared flag that turns the load balancer factory read-only or something like that [14:25:21] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82029 and previous config saved to /var/cache/conftool/dbconfig/20250828-142520-fceratto.json [14:26:19] (03PS1) 10Tiziano Fogli: data-platform: port Zookeeper alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182848 (https://phabricator.wikimedia.org/T309012) [14:28:19] Yeah, totally! [14:28:50] !log mforns@deploy1003 Finished deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 06m 01s) [14:28:53] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11128768 (10phaultfinder) [14:29:59] !log mforns@deploy1003 Started deploy [analytics/refinery@81abaad] (thin): Deploy automated traffic detection improvements [analytics/refinery@81abaad3] [14:30:04] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1430) [14:30:23] (03PS1) 10Vgutierrez: benthos webrequest: Add ja3n X-Analytics sub-field [puppet] - 10https://gerrit.wikimedia.org/r/1182849 (https://phabricator.wikimedia.org/T400270) [14:30:28] (03PS5) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [14:31:08] !log mforns@deploy1003 Finished deploy [analytics/refinery@81abaad] (thin): Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 01m 08s) [14:31:39] (03PS6) 10Clément Goubert: multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) [14:31:53] (03PS1) 10Elukey: profile::base: install linux-base before linux-image 6.12 [puppet] - 10https://gerrit.wikimedia.org/r/1182850 [14:32:09] !log remove dbg packages & repool ms-fe2010 T360913 [14:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:14] T360913: Swift proxy server misbehaviour (no longer calling `accept`?) - https://phabricator.wikimedia.org/T360913 [14:32:46] !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki # T398177 (dry run) [14:32:50] MatmaRex: fyi ^ [14:32:51] T398177: 'renameuser' logs for a global rename use actor ID from metawiki instead of the local one when created by the fixStuckGlobalRename.php script - https://phabricator.wikimedia.org/T398177 [14:33:30] “aawiki Global performer […] does not exist locally” sounds like the case that might have been problematic before [14:33:52] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11128789 (10phaultfinder) [14:34:02] i think that is okay [14:34:16] it usually means that the steward who did the rename has never visited the local wiki [14:34:53] yeah, sounds plausible [14:35:22] and does “Would update performer for local #[num] based on global #[num] from 'Global rename script' to '[name]'” sound okay? [14:35:44] (so far the “from” is always Global rename script, though it’s also not a ton of output yet in the first place) [14:37:07] that one means that the steward who did the rename *had not* visited the local wiki at the time, but they did later [14:37:14] so the log entry would be updated to use their name [14:37:30] which is a bit silly, but harmless. maybe we should skip them? i'll think about it [14:38:42] (03CR) 10Elukey: [C:03+2] profile::base: install linux-base before linux-image 6.12 [puppet] - 10https://gerrit.wikimedia.org/r/1182850 (owner: 10Elukey) [14:39:08] ah, ok [14:39:19] so the rename script fell back to the maintenance user at the time [14:39:33] idk sounds reasonable to me [14:40:02] I guess the output is going to have repetitions of a lot of the same lines [14:40:13] for global performers that don’t exist locally [14:40:19] but at least it lets you see that the script isn’t stuck ;) [14:40:28] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82034 and previous config saved to /var/cache/conftool/dbconfig/20250828-144027-fceratto.json [14:40:34] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [14:40:43] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2238.codfw.wmnet with reason: Maintenance [14:40:44] (the script hasn’t left aawiki yet so I have a feeling this might take a while…) [14:40:51] !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82035 and previous config saved to /var/cache/conftool/dbconfig/20250828-144050-fceratto.json [14:41:22] aha, now it’s reached aawikibooks [14:41:38] “User has existed, but no local log entry for global #9118582” interesting [14:41:55] maybe that means the wiki was created after the rename in question? idk [14:43:17] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82036 and previous config saved to /var/cache/conftool/dbconfig/20250828-144316-fceratto.json [14:44:04] !log mforns@deploy1003 Started deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] [14:44:17] (03PS1) 10Elukey: profile::base: split linux-base 4.12 from linux-image 6.12 install [puppet] - 10https://gerrit.wikimedia.org/r/1182856 [14:44:40] (03CR) 10David Caro: [C:03+2] openstack: enable radosgw quota stats sending to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/1182780 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [14:46:26] (03CR) 10Vgutierrez: [C:03+1] multi-dc: Dynamic rewrite to -ro destinations [puppet] - 10https://gerrit.wikimedia.org/r/1182815 (https://phabricator.wikimedia.org/T402412) (owner: 10Clément Goubert) [14:46:48] FIRING: PuppetFailure: Puppet has failed on ml-serve1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:46:54] (03PS1) 10Hokwelum: Set $wgPHPSessionHandling to 'disable' on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182857 (https://phabricator.wikimedia.org/T362324) [14:48:02] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm [14:48:40] (03CR) 10Elukey: [C:03+2] profile::base: split linux-base 4.12 from linux-image 6.12 install [puppet] - 10https://gerrit.wikimedia.org/r/1182856 (owner: 10Elukey) [14:48:43] (03CR) 10Huei Tan: [C:04-1] "I will create another patch for the backport." [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1181830 (https://phabricator.wikimedia.org/T402496) (owner: 10Abijeet Patro) [14:48:47] (03Abandoned) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1181830 (https://phabricator.wikimedia.org/T402496) (owner: 10Abijeet Patro) [14:53:59] (03PS2) 10Clément Goubert: rest-gateway: Introduce rest-gateway-ro [puppet] - 10https://gerrit.wikimedia.org/r/1182852 (https://phabricator.wikimedia.org/T402412) [14:54:16] (03PS2) 10Clément Goubert: wmnet: Introduce rest-gateway-ro [dns] - 10https://gerrit.wikimedia.org/r/1182853 (https://phabricator.wikimedia.org/T400131) [14:55:14] (03PS1) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182861 (https://phabricator.wikimedia.org/T402496) [14:57:41] (03CR) 10RLazarus: [V:03+2 C:03+2] Empty the master branch and document the per-version branch structure [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/1182685 (owner: 10RLazarus) [14:58:14] (03PS1) 10Huei Tan: Update HomepageVisit schema to 1.6.1 [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) [14:58:25] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82037 and previous config saved to /var/cache/conftool/dbconfig/20250828-145824-fceratto.json [14:58:57] 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11128998 (10bd808) [15:00:04] andre and jnuche: gettimeofday() says it's time for Train log triage. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1500) [15:03:38] !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage [15:04:32] !log ammarpad@deploy1003 mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiki --logwiki=metawiki Chaimabiosport 'Renamed user 5e8f10bb5b5df1e083b4b7b255d947ea' # T403158 [15:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [15:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [15:04:37] T403158: Unblock stuck global rename of Renamed user 5e8f10bb5b5df1e083b4b7b255d947ea - https://phabricator.wikimedia.org/T403158 [15:09:20] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage [15:10:53] (03PS1) 10Jcrespo: dbbackups: Initial setup of dbprov1007, dbprov2007 [puppet] - 10https://gerrit.wikimedia.org/r/1182865 (https://phabricator.wikimedia.org/T403166) [15:12:04] (03CR) 10CDanis: [C:03+1] haproxy: JA3N fingerprinting (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:12:45] (03CR) 10Ottomata: [C:03+1] "Hi! I can but it probably would be best to go through the official deployment process!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182621 (https://phabricator.wikimedia.org/T398057) (owner: 10KCVelaga) [15:12:56] (03CR) 10CDanis: [C:03+1] haproxy: JA3N fingerprinting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:13:32] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82039 and previous config saved to /var/cache/conftool/dbconfig/20250828-151331-fceratto.json [15:13:58] (03PS1) 10Kosta Harlan: hCaptcha: Set a default value for remoteip [extensions/ConfirmEdit] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182866 (https://phabricator.wikimedia.org/T379179) [15:14:07] jouncebot: nowandnext [15:14:08] For the next 0 hour(s) and 45 minute(s): Train log triage (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1500) [15:14:08] In 0 hour(s) and 45 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1600) [15:14:34] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182866 (https://phabricator.wikimedia.org/T379179) (owner: 10Kosta Harlan) [15:15:03] (03Abandoned) 10Muehlenhoff: Deploy linux 6.12 from bookworm-backports on DSE k8s workers [puppet] - 10https://gerrit.wikimedia.org/r/1182817 (https://phabricator.wikimedia.org/T394459) (owner: 10Muehlenhoff) [15:15:34] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1182832 (https://phabricator.wikimedia.org/T394897) (owner: 10Brouberol) [15:17:25] (03CR) 10Brouberol: [V:03+1 C:03+2] dse-k8s-worker: ensure that a backported 6.12 kernel is installed [puppet] - 10https://gerrit.wikimedia.org/r/1182832 (https://phabricator.wikimedia.org/T394897) (owner: 10Brouberol) [15:18:14] MatmaRex: ca. 45 minutes later, the FixRenameUserLocalLogs has made it through aawiki..abwiktionary (inclusive) and is now in acewiki [15:18:19] is it expected to take this long? [15:18:51] I had a quick look at the script and wonder if the log_timestamp filter could use a second condition, <= timestamp + 1 day or whatever [15:18:58] (but you probably know better ^^) [15:20:28] 10SRE-SLO, 10EditCheck, 10Lift-Wing, 06Machine-Learning-Team, 10Editing-team (Tracking): Create SLO dashboard for tone (peacock) check model - https://phabricator.wikimedia.org/T390706#11129115 (10gkyziridis) Hey @elukey. > And I see that Ilias deployed on staging and prod the same day, just earlier on:... [15:21:47] (counterpoint: I don’t see any related logging queries in the slow database queries logstash, so it doesn’t look like anything is taking more than 5 seconds) [15:21:48] RESOLVED: PuppetFailure: Puppet has failed on ml-serve1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:22:10] (03PS2) 10Vgutierrez: haproxy: JA3N fingerprinting (only in cp7001) [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) [15:22:25] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:22:44] Lucas_WMDE: this patch: https://gerrit.wikimedia.org/r/q/I9e63bc2079dbbeff809eeb7e03327fbbe6e24ceb actually adds a <= timestamp + 1 week condition. i backported it, but it's not merged on master yet [15:23:06] ah, that’s why I didn’t see it locally ^^ [15:23:15] Lucas_WMDE: it is kind of expected… the queries should be fast, but there's a lot of them [15:23:20] I thought something about the timestamp rang a bell [15:23:21] ok [15:23:35] i could probably optimize it so that the queries on the local wiki would also be done in batches instead of one-by-one [15:24:15] i guess i probably should [15:24:22] because it's going to take all week at this pace [15:26:04] I’m glad I started it in a tmux session [15:26:10] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1013.eqiad.wmnet with OS bookworm [15:26:13] (otherwise it’d be fine but I’d have to grab the output from k8s later) [15:27:13] !log depool cp7001 before deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182834 [15:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:35] !log elukey@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" [15:27:38] (03CR) 10Vgutierrez: [C:03+2] haproxy: JA3N fingerprinting (only in cp7001) [puppet] - 10https://gerrit.wikimedia.org/r/1182834 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:28:39] !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82041 and previous config saved to /var/cache/conftool/dbconfig/20250828-152839-fceratto.json [15:28:44] T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906 [15:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [15:30:36] (03Merged) 10jenkins-bot: hCaptcha: Set a default value for remoteip [extensions/ConfirmEdit] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182866 (https://phabricator.wikimedia.org/T379179) (owner: 10Kosta Harlan) [15:30:39] elukey@cumin1003 reimage (PID 4109451) is awaiting input [15:30:53] !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1182866|hCaptcha: Set a default value for remoteip (T379179)]] [15:30:58] T379179: Send hCaptcha API response data to event platform - https://phabricator.wikimedia.org/T379179 [15:31:18] !log elukey@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" [15:31:18] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS bookworm [15:32:34] (03CR) 10CDanis: [C:03+1] benthos webrequest: Add ja3n X-Analytics sub-field [puppet] - 10https://gerrit.wikimedia.org/r/1182849 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:33:36] (03CR) 10Bartosz Dziewoński: [C:03+1] Set $wgPHPSessionHandling to 'disable' on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182857 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum) [15:34:56] !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1182866|hCaptcha: Set a default value for remoteip (T379179)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:35:02] !log andrew@cumin2002 START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1005-1009].eqiad.wmnet [15:35:03] !log repool cp7001 [15:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:05] !log kharlan@deploy1003 kharlan: Continuing with sync [15:36:51] (03CR) 10SBassett: "Yeah, makes sense to me. I'm not sure I've ever run into this issue personally, but I know other folks have. As long as everyone who nee" [puppet] - 10https://gerrit.wikimedia.org/r/1182629 (https://phabricator.wikimedia.org/T401672) (owner: 10Ahmon Dancy) [15:40:44] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, August 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182857 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum) [15:41:17] !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182866|hCaptcha: Set a default value for remoteip (T379179)]] (duration: 10m 24s) [15:41:19] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.08.16 - 2025.09.05): decommission an-worker109[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T401678#11129188 (10VRiley-WMF) 05Open→03Resolved [15:41:23] T379179: Send hCaptcha API response data to event platform - https://phabricator.wikimedia.org/T379179 [15:41:50] (03PS1) 10Vgutierrez: haproxy: Don't add ja3n on requestctl backend [puppet] - 10https://gerrit.wikimedia.org/r/1182869 (https://phabricator.wikimedia.org/T400270) [15:43:00] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182869 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:43:08] (03PS1) 10Sergio Gimeno: [Growth] beta: enable growth notifications tracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182870 (https://phabricator.wikimedia.org/T400048) [15:43:50] !log mforns@deploy1003 Finished deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 59m 46s) [15:43:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [15:44:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, August 28 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182870 (https://phabricator.wikimedia.org/T400048) (owner: 10Sergio Gimeno) [15:45:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [15:47:06] (03CR) 10Andrew Bogott: [C:03+2] Remove refs to cloudcephosd1004-1015 [puppet] - 10https://gerrit.wikimedia.org/r/1182841 (https://phabricator.wikimedia.org/T402881) (owner: 10Andrew Bogott) [15:47:09] (03PS1) 10DDesouza: miscweb(design-strategy): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182872 (https://phabricator.wikimedia.org/T344471) [15:49:24] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182869 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:49:37] (03CR) 10DDesouza: [C:03+2] miscweb(design-strategy): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182872 (https://phabricator.wikimedia.org/T344471) (owner: 10DDesouza) [15:52:00] (03Merged) 10jenkins-bot: miscweb(design-strategy): bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182872 (https://phabricator.wikimedia.org/T344471) (owner: 10DDesouza) [15:52:25] (03CR) 10Vgutierrez: [C:03+2] haproxy: Don't add ja3n on requestctl backend [puppet] - 10https://gerrit.wikimedia.org/r/1182869 (https://phabricator.wikimedia.org/T400270) (owner: 10Vgutierrez) [15:52:51] andrew@cumin2002 decommission (PID 1362118) is awaiting input [15:54:14] 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Remove static routes for LVS VIPs from core routers - https://phabricator.wikimedia.org/T300877#11129257 (10ssingh) Hi Netops folks: Thanks for your feedback. Following up again after discussing this with Traffic. We decided that we will do away wi... [15:54:20] !log dani@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply [15:54:35] !log dani@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply [15:54:37] !log dani@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply [15:54:52] !log dani@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [15:54:54] !log dani@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply [15:55:09] !log dani@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply [15:55:16] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [15:56:02] (03PS1) 10Bernard Wang: Remove deprecated search config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182875 [15:56:45] (03PS1) 10Dbrant: rest-gateway: route wikifeeds configuration endpoint. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182876 (https://phabricator.wikimedia.org/T403193) [15:57:31] (03PS2) 10Bernard Wang: Remove deprecated search config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182875 (https://phabricator.wikimedia.org/T402208) [15:57:43] !log andrew@cumin2002 START - Cookbook sre.dns.netbox [15:57:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [16:00:05] jhathaway and moritzm: Time to snap out of that daydream and deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1600). [16:00:05] Dreamy_Jazz: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:01:25] o/ [16:02:04] !log andrew@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1005-1009].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [16:02:26] !log andrew@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1005-1009].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [16:02:26] !log andrew@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:02:27] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1005-1009].eqiad.wmnet [16:04:06] !log andrew@cumin2002 START - Cookbook sre.hosts.decommission for hosts cloudcephosd1010.eqiad.wmnet [16:04:23] 10ops-codfw, 06SRE, 06DC-Ops, 06Traffic, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11129305 (10elukey) The `late_command.sh` issue is fixed after https://gerrit.wikimedia.org/r/1182766, I tested a bookworm reimage for cp2043 and it worked nice... [16:08:36] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance [16:09:22] !log andrew@cumin2002 START - Cookbook sre.dns.netbox [16:11:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [16:12:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.958s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [16:15:07] andrew@cumin2002 decommission (PID 1376676) is awaiting input [16:16:18] !log andrew@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [16:17:20] (03CR) 10Elukey: [C:03+1] "LGTM! Is npm going to be included?" [puppet] - 10https://gerrit.wikimedia.org/r/1182835 (https://phabricator.wikimedia.org/T393437) (owner: 10Muehlenhoff) [16:18:21] (03CR) 10Jforrester: "Do we also want to package for Trixie now, or should we wait for later adoption?" [puppet] - 10https://gerrit.wikimedia.org/r/1182835 (https://phabricator.wikimedia.org/T393437) (owner: 10Muehlenhoff) [16:18:39] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:19:01] (03PS1) 10Scott French: Improvements to policy validation handling [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1182879 [16:19:24] andrew@cumin2002 decommission (PID 1376676) is awaiting input [16:21:49] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [16:22:16] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [16:23:22] (03CR) 10Hnowlan: [C:03+1] rest-gateway: route wikifeeds configuration endpoint. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182876 (https://phabricator.wikimedia.org/T403193) (owner: 10Dbrant) [16:27:12] !log amastilovic@deploy1003 Started deploy [analytics/refinery@b967872] (hadoop-test): Urgent updates to Sqoop TEST [analytics/refinery@b9678720] [16:27:20] (03CR) 10Hnowlan: [C:03+1] "lgtm in theory - will changing rest-gateway to metafo have any impact on records in use atm?" [dns] - 10https://gerrit.wikimedia.org/r/1182853 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [16:28:03] !log amastilovic@deploy1003 Finished deploy [analytics/refinery@b967872] (hadoop-test): Urgent updates to Sqoop TEST [analytics/refinery@b9678720] (duration: 00m 51s) [16:28:07] (03PS1) 10Herron: profile::pyrra::filesystem::slo: add new slo define [puppet] - 10https://gerrit.wikimedia.org/r/1182886 (https://phabricator.wikimedia.org/T349521) [16:28:34] !log amastilovic@deploy1003 Started deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] [16:29:24] (03CR) 10Clément Goubert: "I think we need to first depool the secondary backend of rest-gateway, then merge I6e3e773d6, then merge this?" [dns] - 10https://gerrit.wikimedia.org/r/1182853 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [16:29:27] (03CR) 10Elukey: "Nice work Jesse!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 (owner: 10JHathaway) [16:30:36] !log amastilovic@deploy1003 Finished deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] (duration: 02m 01s) [16:34:15] !log amastilovic@deploy1003 Started deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] [16:34:51] !log amastilovic@deploy1003 Finished deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] (duration: 00m 36s) [16:35:49] !log amastilovic@deploy1003 Started deploy [analytics/refinery@b967872] (thin): Urgent updates to Sqoop THIN [analytics/refinery@b9678720] [16:36:38] !log amastilovic@deploy1003 Finished deploy [analytics/refinery@b967872] (thin): Urgent updates to Sqoop THIN [analytics/refinery@b9678720] (duration: 00m 49s) [16:43:59] (03PS2) 10Vgutierrez: benthos webrequest: Add ja3n X-Analytics sub-field [puppet] - 10https://gerrit.wikimedia.org/r/1182849 (https://phabricator.wikimedia.org/T400270) [16:45:59] (03PS1) 10CDobbins: Revert^2 "ACMEChiefConfig: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182889 [16:46:25] (03CR) 10BCornwall: [C:03+1] Revert^2 "ACMEChiefConfig: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182889 (owner: 10CDobbins) [16:47:16] (03CR) 10CDobbins: [C:03+2] Revert^2 "ACMEChiefConfig: Automated MarkMonitor domain sync" [puppet] - 10https://gerrit.wikimedia.org/r/1182889 (owner: 10CDobbins) [16:49:24] (03CR) 10BryanDavis: [C:03+1] mariadb-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182805 (owner: 10Majavah) [16:49:34] (03CR) 10BryanDavis: [C:03+1] tcl86-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182806 (https://phabricator.wikimedia.org/T400256) (owner: 10Majavah) [16:49:54] (03CR) 10Majavah: [C:03+2] mariadb-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182805 (owner: 10Majavah) [16:49:56] (03CR) 10Majavah: [C:03+2] tcl86-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182806 (https://phabricator.wikimedia.org/T400256) (owner: 10Majavah) [16:50:29] (03Merged) 10jenkins-bot: mariadb-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182805 (owner: 10Majavah) [16:50:32] (03Merged) 10jenkins-bot: tcl86-sssd: Build on Trixie [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1182806 (https://phabricator.wikimedia.org/T400256) (owner: 10Majavah) [16:53:36] (03PS1) 10CDobbins: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182892 [16:53:52] (03CR) 10CI reject: [V:04-1] Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182892 (owner: 10CDobbins) [16:56:05] (03Abandoned) 10CDobbins: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182892 (owner: 10CDobbins) [16:56:43] (03PS1) 10CDobbins: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182893 [16:56:48] (03CR) 10Scott French: [V:03+2 C:03+2] Improvements to policy validation handling [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1182879 (owner: 10Scott French) [16:56:59] (03CR) 10CI reject: [V:04-1] Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182893 (owner: 10CDobbins) [16:57:27] (03CR) 10RLazarus: [C:03+2] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182680 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [16:57:37] (03CR) 10Jdlrobson: "Thanks Bernard! I can land this Tuesday next week." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182875 (https://phabricator.wikimedia.org/T402208) (owner: 10Bernard Wang) [16:59:00] (03PS1) 10CDobbins: Revert^2 "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182894 [16:59:15] (03Merged) 10jenkins-bot: {api,rest}-gateway: Upgrade to Envoy 1.26.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182680 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [16:59:17] (03CR) 10Jdlrobson: Remove deprecated search config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182875 (https://phabricator.wikimedia.org/T402208) (owner: 10Bernard Wang) [16:59:30] (03Abandoned) 10CDobbins: Revert "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182893 (owner: 10CDobbins) [16:59:32] (03CR) 10BCornwall: [C:03+1] Revert^2 "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182894 (owner: 10CDobbins) [16:59:53] (03CR) 10CDobbins: [C:03+2] Revert^2 "ncredir: funnel wikimint.org" [puppet] - 10https://gerrit.wikimedia.org/r/1182894 (owner: 10CDobbins) [17:00:05] bd808: #bothumor I � Unicode. All rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1700). [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T1700) [17:00:27] * bd808 has stuff to deploy [17:01:01] !log swfrench@cumin2002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy improvements to policy validation handling - swfrench@cumin2002" [17:01:04] !log swfrench@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy improvements to policy validation handling - swfrench@cumin2002 [17:01:16] I'll be messing with api-gateway and rest-gateway but only in staging [17:01:52] !log swfrench@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy improvements to policy validation handling - swfrench@cumin2002 [17:01:54] !log swfrench@cumin2002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy improvements to policy validation handling - swfrench@cumin2002" [17:03:04] (03PS5) 10JHathaway: provision: poll for reboot via Redfish [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 [17:04:25] 06SRE, 06Traffic: Move ncredir7003 into service and decom ncredir7002 - https://phabricator.wikimedia.org/T395796#11129503 (10ssingh) 05Open→03Resolved a:03ssingh Seems like the parent task T394263 already tracks this work and `ncredir700[34]` have been moved into service while `ncredir700[12]` have... [17:05:14] 06SRE, 06Traffic: Move ncredir7003 into service and decom ncredir7002 - https://phabricator.wikimedia.org/T395796#11129509 (10ssingh) [17:07:09] 06SRE, 10Ganeti, 06Traffic: Decommission doh7001 and durum7001 - https://phabricator.wikimedia.org/T396015#11129538 (10ssingh) 05Open→03Resolved a:03ssingh `(doh|durum)700[12]` decomissioned and `(doh|durum)700[34]` are in service now. Tracking was in parent task T394263 so resolving this. [17:07:16] !log andrew@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [17:07:17] !log andrew@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:07:18] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1010.eqiad.wmnet [17:07:29] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance [17:07:37] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82042 and previous config saved to /var/cache/conftool/dbconfig/20250828-170736-ladsgroup.json [17:07:42] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [17:10:27] !log andrew@cumin2002 START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1011-1015].eqiad.wmnet [17:11:07] (03PS1) 10BryanDavis: developer-portal: Bump to 2025-08-28-150847-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182897 (https://phabricator.wikimedia.org/T402945) [17:12:23] !log rzl@deploy1003 helmfile [staging] START helmfile.d/services/api-gateway: apply [17:12:46] !log rzl@deploy1003 helmfile [staging] DONE helmfile.d/services/api-gateway: apply [17:13:01] (03PS1) 10Herron: pyrra: citoid enable revision param [puppet] - 10https://gerrit.wikimedia.org/r/1182898 (https://phabricator.wikimedia.org/T400073) [17:13:31] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82043 and previous config saved to /var/cache/conftool/dbconfig/20250828-171330-ladsgroup.json [17:13:35] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [17:16:43] andrew@cumin2002 decommission (PID 1409224) is awaiting input [17:17:15] (03CR) 10BryanDavis: [C:03+2] developer-portal: Bump to 2025-08-28-150847-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182897 (https://phabricator.wikimedia.org/T402945) (owner: 10BryanDavis) [17:17:38] (03PS1) 10Andrew Bogott: profile_cloudceph_mon_spec.rb: replace test with IPs for cloudcephosd1052 [puppet] - 10https://gerrit.wikimedia.org/r/1182899 (https://phabricator.wikimedia.org/T402881) [17:18:52] (03Merged) 10jenkins-bot: developer-portal: Bump to 2025-08-28-150847-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182897 (https://phabricator.wikimedia.org/T402945) (owner: 10BryanDavis) [17:18:55] (03PS1) 10David Caro: wmcs: add object storage quota alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) [17:19:32] (03CR) 10David Caro: "This might get us an 'always there' warning quota for non-managed projects, but it's the only way we have to know if users are hitting the" [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [17:20:24] (03CR) 10CI reject: [V:04-1] wmcs: add object storage quota alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [17:22:08] (03PS2) 10Herron: pyrra: citoid enable revision param [puppet] - 10https://gerrit.wikimedia.org/r/1182898 (https://phabricator.wikimedia.org/T400073) [17:22:12] (03PS2) 10David Caro: wmcs: add object storage quota alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) [17:22:19] (03CR) 10CDobbins: [V:03+1] dnsrecursor: add recursor.yml.erb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [17:22:19] (03PS7) 10Herron: profile::pyrra::filesystem::slo: add new slo define [puppet] - 10https://gerrit.wikimedia.org/r/1182886 (https://phabricator.wikimedia.org/T349521) [17:22:42] !log bd808@deploy1003 helmfile [staging] START helmfile.d/services/developer-portal: apply [17:23:25] !log bd808@deploy1003 helmfile [staging] DONE helmfile.d/services/developer-portal: apply [17:23:55] (03CR) 10Herron: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6787/co" [puppet] - 10https://gerrit.wikimedia.org/r/1182886 (https://phabricator.wikimedia.org/T349521) (owner: 10Herron) [17:23:59] (03CR) 10CI reject: [V:04-1] wmcs: add object storage quota alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [17:24:31] (03Abandoned) 10CDobbins: dnsrecursor: remove hardcoded values and tidy up [puppet] - 10https://gerrit.wikimedia.org/r/1172056 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [17:24:51] (03CR) 10Herron: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6788/co" [puppet] - 10https://gerrit.wikimedia.org/r/1182898 (https://phabricator.wikimedia.org/T400073) (owner: 10Herron) [17:26:09] (03CR) 10Herron: [V:03+1] "although this removes the namespace labels it should functionally be a noop as we don't need them in the filesystem operator" [puppet] - 10https://gerrit.wikimedia.org/r/1182886 (https://phabricator.wikimedia.org/T349521) (owner: 10Herron) [17:26:30] !log bd808@deploy1003 helmfile [codfw] START helmfile.d/services/developer-portal: apply [17:27:21] !log bd808@deploy1003 helmfile [codfw] DONE helmfile.d/services/developer-portal: apply [17:27:22] (03PS3) 10David Caro: wmcs: add object storage quota alerts [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) [17:27:56] !log bd808@deploy1003 helmfile [eqiad] START helmfile.d/services/developer-portal: apply [17:28:00] (03PS46) 10CDobbins: dnsrecursor: add recursor.yml.erb [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) [17:28:22] !log bd808@deploy1003 helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply [17:28:38] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P82044 and previous config saved to /var/cache/conftool/dbconfig/20250828-172837-ladsgroup.json [17:29:53] (03CR) 10Majavah: "My feeling is that we don't want even those alerts for non-managed projects." [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [17:31:44] !log andrew@cumin2002 START - Cookbook sre.dns.netbox [17:31:54] (03CR) 10David Caro: "Essentially so we can open a task for them, or tell them that their quota is full, as they can't know themselves (they can ssh to their VM" [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro) [17:33:10] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:37:23] andrew@cumin2002 decommission (PID 1409224) is awaiting input [17:38:50] 06SRE, 10DNS, 06Traffic, 10WikiLearn: DNS records for WikiLearn - https://phabricator.wikimedia.org/T365435#11129696 (10ssingh) Hi @Asaf: Is there an update to this? We are cleaning up and triaging the tasks and so the context is if there is anything required from our end, of if there is an update from yours. [17:40:00] (03PS3) 10Cwhite: monitoring: ensure disk space check is absent [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) [17:40:40] 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11129701 (10TheDJ) @elukey this report might indicate a problem. Not 100% sure yet. https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Maplink_wikidata_broken? [17:40:44] (03PS4) 10Cwhite: monitoring: disable icinga disk space check [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) [17:41:34] (03CR) 10Ssingh: [C:03+1] "Looks good! Let's merge on Monday, even though this is strictly NOOP:" [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [17:42:37] (03CR) 10Ssingh: [C:03+1] dnsrecursor: add recursor.yml.erb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [17:43:46] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P82045 and previous config saved to /var/cache/conftool/dbconfig/20250828-174345-ladsgroup.json [17:44:26] !log andrew@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1011-1015].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [17:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:45:52] (03CR) 10Cwhite: monitoring: disable icinga disk space check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [17:47:31] andrew@cumin2002 decommission (PID 1409224) is awaiting input [17:57:27] (03CR) 10DCausse: [C:04-1] "does not appear to work sadly, opensearch refuses to set this as auto_expand_replicas, we might need to adjust cirrus to switch to another" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182192 (https://phabricator.wikimedia.org/T402627) (owner: 10Ebernhardson) [17:58:53] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82046 and previous config saved to /var/cache/conftool/dbconfig/20250828-175852-ladsgroup.json [17:58:58] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [17:59:09] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance [17:59:16] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82047 and previous config saved to /var/cache/conftool/dbconfig/20250828-175915-ladsgroup.json [18:05:08] !log andrew@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1011-1015].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002" [18:05:09] !log andrew@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:05:09] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82048 and previous config saved to /var/cache/conftool/dbconfig/20250828-180508-ladsgroup.json [18:05:10] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1011-1015].eqiad.wmnet [18:05:20] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [18:05:54] 10ops-eqiad, 06cloud-services-team, 06DC-Ops, 10decommission-hardware: decommission cloudcephosd1004-10015 - https://phabricator.wikimedia.org/T402881#11129735 (10Andrew) a:05Andrew→03None [18:06:39] RESOLVED: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld [18:20:16] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P82049 and previous config saved to /var/cache/conftool/dbconfig/20250828-182016-ladsgroup.json [18:32:02] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182861 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan) [18:33:01] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan) [18:33:04] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan) [18:33:53] (03PS3) 10Reedy: CommonSettings.php: Remove old $wgCentralDBname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1129230 (https://phabricator.wikimedia.org/T389348) [18:33:54] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11129797 (10phaultfinder) [18:33:56] (03PS4) 10Reedy: CommonSettings.php: Remove old $wgCentralDBname [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1129230 (https://phabricator.wikimedia.org/T389348) [18:34:38] jouncebot: nowandnext [18:34:38] No deployments scheduled for the next 1 hour(s) and 25 minute(s) [18:34:39] In 1 hour(s) and 25 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T2000) [18:35:24] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P82050 and previous config saved to /var/cache/conftool/dbconfig/20250828-183523-ladsgroup.json [18:35:34] 06SRE, 10DNS, 06FR-donorrelations, 06Traffic: Custom URL for survey pop-up - https://phabricator.wikimedia.org/T400278#11129811 (10ssingh) Hi @EBrill-WMF: I am sorry this slipped through the cracks. Unfortunately, we will be unable to make this work: `donorsatisfaction.wikimedia.org`. The reason is that... [18:37:50] jhathaway: Sorry, I missed the puppet window. Is there any chance that the patch could be deployed today? If not, no worries. [18:38:15] no worries, I can merge it in now, if you are ready? [18:38:21] Yeah. I am ready [18:38:23] Thanks [18:38:28] (03CR) 10JHathaway: [C:03+2] mediawiki: Run CheckUser/revokeTemporaryAccountViewerGroup.php [puppet] - 10https://gerrit.wikimedia.org/r/1181689 (https://phabricator.wikimedia.org/T375115) (owner: 10STran) [18:38:47] great, merging... [18:38:51] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11129826 (10phaultfinder) [18:41:38] If you would like to test this, then I would suggest manually triggering the command. This run should cause no changes, because no-one should have been inactive for more than a year that has the group (the group hasn't existed for more than a year) [18:42:35] 14SRE-Sprint-Week-Sustainability-March2023, 10DNS, 06Traffic, 10Sustainability (Incident Followup): Automate DNS depools such that manual commits are not required - https://phabricator.wikimedia.org/T303219#11129836 (10ssingh) 05Open→03Resolved a:03ssingh This was done in T369366. Site depool for... [18:46:21] 06SRE, 10DNS, 06Traffic-Icebox: Monitor DNS delegations - https://phabricator.wikimedia.org/T171470#11129845 (10ssingh) @BCornwall: I would image we are already doing something related to the requirements of this task already with `ncmonitor`. Do you think that fits? [18:46:59] 06SRE, 10DNS, 06Traffic: Monitor DNS delegations - https://phabricator.wikimedia.org/T171470#11129846 (10ssingh) [18:47:54] 06SRE, 10DNS, 06Traffic: Monitor DNS delegations - https://phabricator.wikimedia.org/T171470#11129848 (10ssingh) [T402960 as a related task] [18:50:24] (03CR) 10Ssingh: [C:03+1] "@bcornwall@wikimedia.org will help with the rollout of this, the week of Sep 11." [puppet] - 10https://gerrit.wikimedia.org/r/1180234 (https://phabricator.wikimedia.org/T402284) (owner: 10Dzahn) [18:50:31] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82051 and previous config saved to /var/cache/conftool/dbconfig/20250828-185031-ladsgroup.json [18:50:37] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [18:50:47] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance [18:50:54] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82052 and previous config saved to /var/cache/conftool/dbconfig/20250828-185053-ladsgroup.json [18:53:01] !log jhathaway@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2006.codfw.wmnet with reason: sleep test [18:56:49] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82053 and previous config saved to /var/cache/conftool/dbconfig/20250828-185648-ladsgroup.json [18:56:54] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [18:57:52] 06SRE, 10Maps, 06Traffic: Allow Wikimedia Maps usage on Wikidata for Firefox (Browser extension) - https://phabricator.wikimedia.org/T398588#11129879 (10ssingh) Hi: please see https://lists.wikimedia.org/pipermail/maps-l/2020-August/001729.html and T261424 on general information on why we are restricting thi... [18:59:17] 06SRE, 10Maps, 06Traffic: Allow Wikimedia Maps usage on Wikidata for Firefox (Browser extension) - https://phabricator.wikimedia.org/T398588#11129884 (10ssingh) In general, our approval thus far can only apply to websites and not extensions. So there is a technical restriction that will prevent this from hap... [19:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [19:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [19:08:36] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 06Traffic: Spicerack's Icinga module should provide a way to skip specific services in sub-optimal but desired state - https://phabricator.wikimedia.org/T392848#11129910 (10ssingh) 05Open→03Resolved a:03ssingh Thanks a lot to @elukey and @Vo... [19:10:46] 06SRE, 06Infrastructure-Foundations, 10Puppet-Core, 06Traffic: allow non-roots to pool/depool certain DNS Discovery services - https://phabricator.wikimedia.org/T250557#11129914 (10ssingh) Is there still interest in pursuing this? This overlaps with both I/F and Traffic so we will need to assign it accordi... [19:11:57] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P82054 and previous config saved to /var/cache/conftool/dbconfig/20250828-191156-ladsgroup.json [19:17:09] 06SRE, 06Commons, 06Traffic: HTTP 500 Error while trying to make large tabular JSON data file - https://phabricator.wikimedia.org/T329339#11129936 (10ssingh) 05Open→03Resolved a:03ssingh This has been open for two years at this point and there has been no follow-up from the reporter, or Traffic, an... [19:25:25] (03PS47) 10CDobbins: dnsrecursor: add recursor.yml.erb [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) [19:25:46] (03CR) 10CDobbins: dnsrecursor: add recursor.yml.erb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [19:26:18] 06SRE, 10Observability-Metrics, 06Traffic: Port Traffic dashboards to Thanos - https://phabricator.wikimedia.org/T302266#11129959 (10ssingh) 05Open→03Resolved a:03ssingh As far as I can tell and especially the ones listed as missing, all dashboards have been ported to Thanos. Please let me know if... [19:27:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P82055 and previous config saved to /var/cache/conftool/dbconfig/20250828-192704-ladsgroup.json [19:29:28] (03CR) 10Ssingh: "Thanks for updating. Let's merge on Monday." [puppet] - 10https://gerrit.wikimedia.org/r/1169156 (https://phabricator.wikimedia.org/T381608) (owner: 10CDobbins) [19:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [19:41:49] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [19:42:12] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82056 and previous config saved to /var/cache/conftool/dbconfig/20250828-194211-ladsgroup.json [19:42:17] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [19:42:27] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance [19:42:34] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82057 and previous config saved to /var/cache/conftool/dbconfig/20250828-194234-ladsgroup.json [19:48:28] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82058 and previous config saved to /var/cache/conftool/dbconfig/20250828-194828-ladsgroup.json [19:48:34] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [19:51:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [19:51:55] (03CR) 10Andrew Bogott: [C:03+2] profile_cloudceph_mon_spec.rb: replace test with IPs for cloudcephosd1052 [puppet] - 10https://gerrit.wikimedia.org/r/1182899 (https://phabricator.wikimedia.org/T402881) (owner: 10Andrew Bogott) [19:54:56] !log jhathaway@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2010.codfw.wmnet with reason: sleep test [19:57:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [19:59:31] (03CR) 10Jdlrobson: "can I suggest the user of a dblist at this point to make this easier to track and reduce the lines of code in this shared file? I also won" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152558 (https://phabricator.wikimedia.org/T380930) (owner: 10KartikMistry) [20:00:04] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T2000). Please do the needful. [20:00:04] JustHannah and sergi0: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:27] o/ [20:00:40] :-D [20:01:32] I can help deploy today if needed. [20:02:08] sergi0 I'll do yours first since it's beta-only. [20:02:26] Thank you @dancy [20:02:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dancy@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182870 (https://phabricator.wikimedia.org/T400048) (owner: 10Sergio Gimeno) [20:03:22] (03Merged) 10jenkins-bot: [Growth] beta: enable growth notifications tracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182870 (https://phabricator.wikimedia.org/T400048) (owner: 10Sergio Gimeno) [20:03:36] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P82059 and previous config saved to /var/cache/conftool/dbconfig/20250828-200335-ladsgroup.json [20:04:17] sergi0: All set. [20:04:24] JustHannah: Ready? [20:04:34] great, thanks! [20:04:43] Yes, dancy [20:04:51] Please go ahead! [20:04:55] ok! [20:05:04] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dancy@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182857 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum) [20:05:58] (03Merged) 10jenkins-bot: Set $wgPHPSessionHandling to 'disable' on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182857 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum) [20:06:12] !log dancy@deploy1003 Started scap sync-world: Backport for [[gerrit:1182857|Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324)]] [20:06:17] T362324: Disable PHPSessionHandler in Wikimedia production - https://phabricator.wikimedia.org/T362324 [20:10:12] !log dancy@deploy1003 hokwelum, dancy: Backport for [[gerrit:1182857|Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [20:11:36] (03CR) 10Andrea Denisse: "The 2 phase NRPE removal is a pretty good suggestion!! LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [20:11:43] (03CR) 10Andrea Denisse: [C:03+1] monitoring: disable icinga disk space check [puppet] - 10https://gerrit.wikimedia.org/r/1180642 (https://phabricator.wikimedia.org/T332764) (owner: 10Cwhite) [20:12:01] JustHannah: Ready for testing [20:12:23] dancy: Yes! [20:12:54] Alright. Let lemme know when you've verified things. [20:14:53] dancy: Thank you! Looks good! [20:15:01] !log dancy@deploy1003 hokwelum, dancy: Continuing with sync [20:15:06] Awesome. [20:15:46] :-) [20:18:44] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P82060 and previous config saved to /var/cache/conftool/dbconfig/20250828-201843-ladsgroup.json [20:19:35] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:19:48] (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [alerts] - 10https://gerrit.wikimedia.org/r/1182848 (https://phabricator.wikimedia.org/T309012) (owner: 10Tiziano Fogli) [20:20:08] !log dancy@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182857|Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324)]] (duration: 13m 56s) [20:20:14] T362324: Disable PHPSessionHandler in Wikimedia production - https://phabricator.wikimedia.org/T362324 [20:21:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:22:03] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:22:18] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:30:07] JustHannah: Your change is fully deployed, btw. [20:30:58] dancy: Thank you! [20:33:51] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82061 and previous config saved to /var/cache/conftool/dbconfig/20250828-203350-ladsgroup.json [20:33:57] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [20:34:06] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance [20:34:14] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82062 and previous config saved to /var/cache/conftool/dbconfig/20250828-203413-ladsgroup.json [20:40:06] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82063 and previous config saved to /var/cache/conftool/dbconfig/20250828-204006-ladsgroup.json [20:40:12] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [20:40:31] (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [software] - 10https://gerrit.wikimedia.org/r/1182571 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli) [20:41:44] RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [20:42:02] (03CR) 10RLazarus: [C:03+2] mesh: Copy configuration_1.14.0 to 1.14.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182669 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:43:33] (03Merged) 10jenkins-bot: mesh: Copy configuration_1.14.0 to 1.14.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182669 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:43:58] (03CR) 10RLazarus: [C:03+2] mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:44:06] (03CR) 10CI reject: [V:04-1] mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:44:29] (03PS3) 10RLazarus: mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) [20:45:03] (03CR) 10RLazarus: mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:45:07] (03CR) 10RLazarus: [C:03+2] mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:46:39] (03Merged) 10jenkins-bot: mesh: Remove deprecated field config.core.v3.HeaderValueOption.append [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182670 (https://phabricator.wikimedia.org/T403101) (owner: 10RLazarus) [20:52:42] (03PS1) 10Srishakatux: Add namespace alias for scn wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182927 (https://phabricator.wikimedia.org/T375979) [20:53:37] (03PS6) 10JHathaway: provision: poll for reboot via Redfish [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 [20:53:43] (03CR) 10JHathaway: provision: poll for reboot via Redfish (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 (owner: 10JHathaway) [20:54:35] (03CR) 10Scott French: "Apologies for the unsolicited comment, but I think sequencing this change is a bit awkward in terms of maintaining a valid DNS config." [dns] - 10https://gerrit.wikimedia.org/r/1182853 (https://phabricator.wikimedia.org/T400131) (owner: 10Clément Goubert) [20:55:14] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P82064 and previous config saved to /var/cache/conftool/dbconfig/20250828-205513-ladsgroup.json [21:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250828T2100) [21:00:34] (03CR) 10Cathal Mooney: JunOS IBGP: adjust template to work with updated data from plugin (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/1182797 (https://phabricator.wikimedia.org/T402577) (owner: 10Cathal Mooney) [21:01:07] (03CR) 10CI reject: [V:04-1] provision: poll for reboot via Redfish [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 (owner: 10JHathaway) [21:02:45] (03PS7) 10JHathaway: provision: poll for reboot via Redfish [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 [21:05:45] (03CR) 10JHathaway: [C:03+1] mariadb: replace legacy fact for memorysize [puppet] - 10https://gerrit.wikimedia.org/r/1180999 (owner: 10Dzahn) [21:10:22] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P82065 and previous config saved to /var/cache/conftool/dbconfig/20250828-211021-ladsgroup.json [21:25:07] Is anyone deploying anything right now? I want to deploy two patches for security [21:25:29] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82066 and previous config saved to /var/cache/conftool/dbconfig/20250828-212528-ladsgroup.json [21:25:35] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [21:25:44] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance [21:26:02] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [21:26:10] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82067 and previous config saved to /var/cache/conftool/dbconfig/20250828-212609-ladsgroup.json [21:32:31] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82068 and previous config saved to /var/cache/conftool/dbconfig/20250828-213230-ladsgroup.json [21:32:36] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [21:33:10] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:44:35] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:45:44] preparing to deploy a core patch [21:47:38] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P82069 and previous config saved to /var/cache/conftool/dbconfig/20250828-214737-ladsgroup.json [21:52:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [21:55:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [21:59:11] !log Deployed security fix for T402313 [21:59:15] running one more scap [21:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:02:42] !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm [22:02:46] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P82070 and previous config saved to /var/cache/conftool/dbconfig/20250828-220245-ladsgroup.json [22:02:56] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q1:rack/setup/install maps201[1-4] - https://phabricator.wikimedia.org/T400637#11130272 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host maps2011.codfw.wmnet with OS bookworm [22:03:21] (03PS1) 10Jdlrobson: Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 [22:03:31] (03CR) 10Jdlrobson: "I'm further caught up now and it looks like this was somewhat done with 9ed97607bf. Thanks! I posted a followup in https://gerrit.wikimedi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152558 (https://phabricator.wikimedia.org/T380930) (owner: 10KartikMistry) [22:04:50] (03CR) 10CI reject: [V:04-1] Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 (owner: 10Jdlrobson) [22:06:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [22:07:23] (03PS1) 10RLazarus: mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 [22:07:51] (03PS2) 10RLazarus: mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 (https://phabricator.wikimedia.org/T402584) [22:09:29] (03PS2) 10Jdlrobson: Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 [22:10:42] (03CR) 10Jdlrobson: Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 (owner: 10Jdlrobson) [22:10:53] maryum: I've got some infra stuff to push out whenever you're clear but no hurry :) [22:11:07] scap just finished! [22:11:25] !log Security deploy for T403093 [22:11:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:38] rad, thanks [22:11:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [22:12:24] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [22:17:53] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82071 and previous config saved to /var/cache/conftool/dbconfig/20250828-221753-ladsgroup.json [22:17:59] T402925: Drop cl_to and cl_collation from categorylinks in wmf production - https://phabricator.wikimedia.org/T402925 [22:18:09] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance [22:21:17] (03PS3) 10RLazarus: mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 (https://phabricator.wikimedia.org/T402584) [22:23:25] (03CR) 10Scott French: [C:03+1] mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [22:25:47] (03CR) 10RLazarus: [C:03+2] mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [22:28:17] (03Merged) 10jenkins-bot: mw-*: Upgrade to Envoy 1.26.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182945 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [22:30:34] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q1:rack/setup/install maps201[1-4] - https://phabricator.wikimedia.org/T400637#11130302 (10Jclark-ctr) @Jhancock.wm Hey Jen, I was looking at Puppet and here’s what I noticed: Currently, it has: 'maps201[1-4]': - partman/standard.cfg - partman/raid10-4dev-... [22:31:00] !log rzl@deploy1003 Started scap sync-world: https://gerrit.wikimedia.org/r/1182945 [22:36:52] !log rzl@deploy1003 Finished scap sync-world: https://gerrit.wikimedia.org/r/1182945 (duration: 06m 50s) [22:38:51] (03PS1) 10Jclark-ctr: fix partman/standard-efi.cfg for maps[1|2]01-4 [puppet] - 10https://gerrit.wikimedia.org/r/1182947 (https://phabricator.wikimedia.org/T400637) [22:38:53] 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402835#11130328 (10phaultfinder) [22:40:02] (03CR) 10Jclark-ctr: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1182947 (https://phabricator.wikimedia.org/T400637) (owner: 10Jclark-ctr) [22:40:44] RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [22:42:19] (03CR) 10RobH: [C:03+2] fix partman/standard-efi.cfg for maps[1|2]01-4 [puppet] - 10https://gerrit.wikimedia.org/r/1182947 (https://phabricator.wikimedia.org/T400637) (owner: 10Jclark-ctr) [22:43:59] 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T402582#11130344 (10phaultfinder) [22:48:01] I'm going to push out one more thing, upgrading API Gateway and REST Gateway to envoy 1.26 as well [22:52:24] (03PS3) 10RLazarus: {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) [22:52:33] (03CR) 10RLazarus: [C:03+2] {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [22:54:20] (03Merged) 10jenkins-bot: {api,rest}-gateway: Upgrade to Envoy 1.26.8 in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182681 (https://phabricator.wikimedia.org/T402584) (owner: 10RLazarus) [22:55:54] !log rzl@deploy1003 helmfile [staging] START helmfile.d/services/rest-gateway: apply [22:56:16] !log rzl@deploy1003 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [23:02:24] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [23:04:35] FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown [23:04:35] FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag [23:05:15] !log rzl@deploy1003 helmfile [codfw] START helmfile.d/services/api-gateway: apply [23:05:44] !log rzl@deploy1003 helmfile [codfw] DONE helmfile.d/services/api-gateway: apply [23:07:46] !log rzl@deploy1003 helmfile [eqiad] START helmfile.d/services/api-gateway: apply [23:08:08] !log rzl@deploy1003 helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply [23:09:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181310 (https://phabricator.wikimedia.org/T401595) (owner: 10Krinkle) [23:10:51] (03Merged) 10jenkins-bot: Enable wmgUseMdotRouting in Beta Cluster for testwiki only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181310 (https://phabricator.wikimedia.org/T401595) (owner: 10Krinkle) [23:11:06] !log krinkle@deploy1003 Started scap sync-world: Backport for [[gerrit:1181310|Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595)]] [23:11:11] T401595: [Rollout Phase 1] Implement redirect-less mobile routing and enable for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595 [23:12:45] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [23:12:53] !log rzl@deploy1003 helmfile [codfw] START helmfile.d/services/rest-gateway: apply [23:13:04] !log rzl@deploy1003 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [23:13:05] !log krinkle@deploy1003 krinkle: Backport for [[gerrit:1181310|Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [23:14:51] !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [23:16:20] !log rzl@deploy1003 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [23:16:31] !log rzl@deploy1003 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [23:16:41] !log krinkle@deploy1003 krinkle: Continuing with sync [23:19:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [23:20:36] !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm [23:20:49] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q1:rack/setup/install maps201[1-4] - https://phabricator.wikimedia.org/T400637#11130485 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host maps2012.codfw.wmnet with OS bookworm [23:22:00] !log krinkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1181310|Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595)]] (duration: 10m 54s) [23:22:09] T401595: [Rollout Phase 1] Implement redirect-less mobile routing and enable for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T401595 [23:22:09] 06SRE: FY 25/26 WE 5.4.2: Known bots / clients - https://phabricator.wikimedia.org/T400100#11130494 (10Scott_French) a:05Scott_French→03JMeybohm [23:29:35] FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive [23:37:28] jclark@cumin1002 provision (PID 104478) is awaiting input [23:38:05] !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED [23:38:38] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1182959 [23:38:38] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1182959 (owner: 10TrainBranchBot) [23:51:44] FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable [23:53:59] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1182959 (owner: 10TrainBranchBot)