[00:03:10] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:08:05] <icinga-wm>	 PROBLEM - Check unit status of clean-stale-certs on acmechief2002 is CRITICAL: CRITICAL: Status of the systemd unit clean-stale-certs https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[00:08:22] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1183277
[00:08:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1183277 (owner: 10TrainBranchBot)
[00:08:59] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11134722 (10phaultfinder)
[00:13:51] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11134735 (10phaultfinder)
[00:22:52] <wikibugs>	 (03PS1) 10Hamish: Lift permission for event-organizer in Chinese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183278
[00:24:26] <wikibugs>	 (03Abandoned) 10Hamish: Lift permission for event-organizer in Chinese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183278 (owner: 10Hamish)
[00:24:44] <wikibugs>	 (03PS1) 10Hamish: Lift permission for event-organizer in Chinese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183279 (https://phabricator.wikimedia.org/T403350)
[00:28:06] <wikibugs>	 (03PS16) 10Krinkle: varnish: Implement new direct routing for mobile views [puppet] - 10https://gerrit.wikimedia.org/r/1180577 (https://phabricator.wikimedia.org/T401595)
[00:31:15] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1183277 (owner: 10TrainBranchBot)
[00:32:12] <wikibugs>	 (03CR) 10Tim Starling: [C:03+1] varnish: Implement new direct routing for mobile views [puppet] - 10https://gerrit.wikimedia.org/r/1180577 (https://phabricator.wikimedia.org/T401595) (owner: 10Krinkle)
[00:33:44] <wikibugs>	 (03PS1) 10Pppery: Remove fallback for Asturian language [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1183280 (https://phabricator.wikimedia.org/T292750)
[00:53:04] <wikibugs>	 (03PS17) 10Krinkle: varnish: Improve 08-mobile-hostnames-rewrite.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1180969 (https://phabricator.wikimedia.org/T401595)
[00:53:04] <wikibugs>	 (03PS3) 10Krinkle: varnish: Remove 60s cap for mobileaction/useformat on m-dot [puppet] - 10https://gerrit.wikimedia.org/r/1183212 (https://phabricator.wikimedia.org/T401595)
[00:53:04] <wikibugs>	 (03PS17) 10Krinkle: varnish: Implement new direct routing for mobile views [puppet] - 10https://gerrit.wikimedia.org/r/1180577 (https://phabricator.wikimedia.org/T401595)
[00:55:56] <wikibugs>	 (03PS18) 10Krinkle: varnish: Improve 08-mobile-hostnames-rewrite.vtc [puppet] - 10https://gerrit.wikimedia.org/r/1180969 (https://phabricator.wikimedia.org/T401595)
[00:55:57] <wikibugs>	 (03PS4) 10Krinkle: varnish: Remove 60s cap for mobileaction/useformat on m-dot [puppet] - 10https://gerrit.wikimedia.org/r/1183212 (https://phabricator.wikimedia.org/T401595)
[00:55:57] <wikibugs>	 (03PS18) 10Krinkle: varnish: Implement new direct routing for mobile views [puppet] - 10https://gerrit.wikimedia.org/r/1180577 (https://phabricator.wikimedia.org/T401595)
[01:19:36] <jinxer-wm>	 FIRING: [2x] NetworkDeviceAlarmActive: Alarm active on cr1-esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm  - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[01:25:06] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-esams:et-1/0/0 (Core: asw1-bw27-esams:et-0/0/48 {#30367}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[01:27:47] <wikibugs>	 (03PS19) 10Krinkle: varnish: Implement new direct routing for mobile views [puppet] - 10https://gerrit.wikimedia.org/r/1180577 (https://phabricator.wikimedia.org/T401595)
[01:29:16] <wikibugs>	 (03PS1) 10Krinkle: Enable wmgUseMdotRouting in Beta Cluster for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183281 (https://phabricator.wikimedia.org/T401595)
[01:29:36] <jinxer-wm>	 FIRING: [4x] SwitchCoreInterfaceDown: Switch core interface down - asw1-bw27-esams:et-0/0/48 (Core: cr1-esams:et-1/0/0 {#30367}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[01:29:54] <jinxer-wm>	 FIRING: [8x] CoreBGPDown: Core BGP session down between asw1-bw27-esams and cr1-esams (185.15.59.156) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[01:32:56] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:32:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183281 (https://phabricator.wikimedia.org/T401595) (owner: 10Krinkle)
[01:33:47] <wikibugs>	 (03Merged) 10jenkins-bot: Enable wmgUseMdotRouting in Beta Cluster for remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183281 (https://phabricator.wikimedia.org/T401595) (owner: 10Krinkle)
[01:36:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[01:50:56] <wikibugs>	 06SRE, 06Data-Engineering, 06Traffic-Icebox, 10MobileFrontend (Tracking), 07User-notice: RFC: Remove m-dot subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998#11134789 (10Krinkle)
[02:30:43] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183279 (https://phabricator.wikimedia.org/T403350) (owner: 10Hamish)
[02:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[02:53:32] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[03:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[03:40:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:45:20] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device ps1-d4-codfw.mgmt.codfw.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403356 (10phaultfinder) 03NEW
[03:55:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:03:01] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:06:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:13:56] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11134863 (10phaultfinder)
[04:19:00] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11134864 (10phaultfinder)
[05:01:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:03:01] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:08:40] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:19:36] <jinxer-wm>	 FIRING: [2x] NetworkDeviceAlarmActive: Alarm active on cr1-esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm  - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[05:25:06] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-esams:et-1/0/0 (Core: asw1-bw27-esams:et-0/0/48 {#30367}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[05:29:11] <icinga-wm>	 PROBLEM - Backup freshness on backup1014 is CRITICAL: Stale: 1 (gerrit1003), Fresh: 137 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:29:36] <jinxer-wm>	 FIRING: [4x] SwitchCoreInterfaceDown: Switch core interface down - asw1-bw27-esams:et-0/0/48 (Core: cr1-esams:et-1/0/0 {#30367}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[05:29:54] <jinxer-wm>	 FIRING: [8x] CoreBGPDown: Core BGP session down between asw1-bw27-esams and cr1-esams (185.15.59.156) - group core - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[05:33:40] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[05:59:51] <wikibugs>	 (03CR) 10Stang: [C:03+1] Lift permission for event-organizer in Chinese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183279 (https://phabricator.wikimedia.org/T403350) (owner: 10Hamish)
[06:29:11] <icinga-wm>	 RECOVERY - Backup freshness on backup1014 is OK: Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[06:32:36] <kostajh>	 jouncebot: nowandnext
[06:32:36] <jouncebot>	 For the next 0 hour(s) and 27 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250831T0700)
[06:32:37] <jouncebot>	 In 0 hour(s) and 27 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T0700)
[06:32:54] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134910 (10MoritzMuehlenhoff)
[06:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[06:33:33] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183109 (https://phabricator.wikimedia.org/T403263) (owner: 10Kosta Harlan)
[06:34:22] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Disable hCaptcha for API contexts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183109 (https://phabricator.wikimedia.org/T403263) (owner: 10Kosta Harlan)
[06:34:37] <logmsgbot>	 !log kharlan@deploy1003 Started scap sync-world: Backport for [[gerrit:1183109|hCaptcha: Disable hCaptcha for API contexts (T403263)]]
[06:34:40] <stashbot>	 T403263: hCaptcha: Do not enable on API account creations - https://phabricator.wikimedia.org/T403263
[06:35:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove bast3007 as bastion node [puppet] - 10https://gerrit.wikimedia.org/r/1183453 (https://phabricator.wikimedia.org/T402259)
[06:36:28] <wikibugs>	 (03PS1) 10DCausse: SECURITY: declare PoolCounter settings for cirrusbuilddoc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183454 (https://phabricator.wikimedia.org/T401220)
[06:39:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove bast3007 as bastion node [puppet] - 10https://gerrit.wikimedia.org/r/1183453 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[06:43:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access for mszabo [puppet] - 10https://gerrit.wikimedia.org/r/1183462
[06:43:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove access for mszabo [puppet] - 10https://gerrit.wikimedia.org/r/1183462 (owner: 10Muehlenhoff)
[06:46:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[06:49:00] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove access for mszabo [puppet] - 10https://gerrit.wikimedia.org/r/1183462
[06:51:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove access for mszabo [puppet] - 10https://gerrit.wikimedia.org/r/1183462 (owner: 10Muehlenhoff)
[06:51:15] <jinxer-wm>	 FIRING: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[06:53:32] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:55:11] <dcausse>	 !log restarting blazegraph on wdqs1011 (stuck)
[06:55:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:55:55] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Máté Szabó out of all services on: 2410 hosts
[06:56:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[06:58:08] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts bast3007.wikimedia.org
[06:58:17] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: How many deployers does it take to do UTC morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T0700).
[07:00:05] <jouncebot>	 hueitan, Msz2001, kostajh, Hamishcz, and dcausse: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:15] <Msz2001>	 o/
[07:00:16] <dcausse>	 o/
[07:00:21] <kart_>	 hola
[07:00:23] <kostajh>	 hi, i'm nearly done syncing a change 
[07:00:24] <kart_>	 I'll be deploying hueitan's changes.
[07:00:28] <hueitan>	 thank you
[07:00:29] <kostajh>	 k8s seems to be moving very slowly today 
[07:00:31] <Hamishcz>	 o/
[07:00:43] <kart_>	 kostajh: it is Monday!
[07:01:11] <Hamishcz>	 k8s doesnt want to work this weeeeek
[07:01:13] <Hamishcz>	 lol
[07:01:55] <kostajh>	 I asked in -serviceops as well, because opening `shell.php` with `mwscript-k8s` is also very slow 
[07:01:58] <jinxer-wm>	 FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:02:39] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Backport for [[gerrit:1183109|hCaptcha: Disable hCaptcha for API contexts (T403263)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:02:42] <stashbot>	 T403263: hCaptcha: Do not enable on API account creations - https://phabricator.wikimedia.org/T403263
[07:02:49] <kostajh>	 so, my guess would be that we could start deployments in ~15 minutes, but hard to say given that sync-testservers-k8s, which is usually really fast, took 7 minutes 
[07:02:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[07:03:02] <kart_>	 kostajh: Ping me once config deployment is done.
[07:03:07] <kostajh>	 yep
[07:03:52] <Hamishcz>	 kostajh: i'm driving w/ my laptop so pls ping me if my response is required
[07:03:56] <Hamishcz>	 thx
[07:04:24] <logmsgbot>	 !log kharlan@deploy1003 kharlan: Continuing with sync
[07:04:36] <Hamishcz>	 bc i have to pull over then see whats going on :|
[07:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[07:07:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:07:53] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
[07:07:53] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:07:55] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3007.wikimedia.org
[07:08:11] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134958 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `bast3007.wikimedia.org` - bast3007.wikimedia.org (**PASS**)...
[07:09:05] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
[07:09:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
[07:10:04] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134959 (10ops-monitoring-bot) Draining ganeti3005.esams.wmnet of running VMs
[07:10:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
[07:11:10] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
[07:14:35] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:14:50] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:16:10] <kostajh>	 at 80% now
[07:17:01] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir3003.esams.wmnet to plain
[07:17:26] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Remove esams RIPE Atlas measurements [puppet] - 10https://gerrit.wikimedia.org/r/1180085 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:17:46] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134964 (10ops-monitoring-bot) VM ncredir3003.esams.wmnet switching disk type to plain
[07:17:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir3003.esams.wmnet to plain
[07:17:48] <logmsgbot>	 !log kharlan@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183109|hCaptcha: Disable hCaptcha for API contexts (T403263)]] (duration: 43m 11s)
[07:17:51] <stashbot>	 T403263: hCaptcha: Do not enable on API account creations - https://phabricator.wikimedia.org/T403263
[07:17:58] <wikibugs>	 (03PS2) 10Ayounsi: Remove esams RIPE Atlas measurements [puppet] - 10https://gerrit.wikimedia.org/r/1180085 (https://phabricator.wikimedia.org/T402259)
[07:18:14] <kart_>	 kostajh: can I go ahead? :)
[07:18:33] <kostajh>	 kart_: yes
[07:18:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy1003 using scap backport" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182861 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[07:18:59] <kart_>	 hueitan: Starting with the first patch..
[07:19:07] <hueitan>	 (y)
[07:19:15] <hueitan>	 (y)
[07:19:33] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:20:00] <kart_>	 oh, I forgot. CI.
[07:20:07] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Remove esams RIPE Atlas measurements [puppet] - 10https://gerrit.wikimedia.org/r/1180085 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:20:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of durum3003.esams.wmnet to plain
[07:20:11] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:20:40] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134967 (10MoritzMuehlenhoff)
[07:21:35] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134969 (10ops-monitoring-bot) VM durum3003.esams.wmnet switching disk type to plain
[07:21:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum3003.esams.wmnet to plain
[07:22:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1180083 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:23:45] <icinga-wm>	 PROBLEM - Bird Internet Routing Daemon on durum3003 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[07:23:56] <dcausse>	 jouncebot: next
[07:23:57] <jouncebot>	 In 2 hour(s) and 36 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[07:24:09] <icinga-wm>	 PROBLEM - BFD status on asw1-by27-esams.mgmt is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:25:08] <Hamishcz>	 kostajh: everything good now?
[07:25:45] <icinga-wm>	 RECOVERY - Bird Internet Routing Daemon on durum3003 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[07:25:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of doh3003.wikimedia.org to plain
[07:26:09] <kostajh>	 Hamishcz: kart_ is deploying 
[07:26:09] <icinga-wm>	 RECOVERY - BFD status on asw1-by27-esams.mgmt is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:26:18] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134979 (10ops-monitoring-bot) VM doh3003.wikimedia.org switching disk type to plain
[07:26:20] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
[07:26:36] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh3003.wikimedia.org to plain
[07:26:40] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
[07:26:44] <Hamishcz>	 okayyy
[07:28:31] <icinga-wm>	 PROBLEM - Bird Internet Routing Daemon on doh3003 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[07:29:09] <icinga-wm>	 PROBLEM - BFD status on asw1-by27-esams.mgmt is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:29:39] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:29:49] <wikibugs>	 (03Merged) 10jenkins-bot: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182861 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[07:30:09] <logmsgbot>	 !log kartik@deploy1003 Started scap sync-world: Backport for [[gerrit:1182861|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]]
[07:30:09] <icinga-wm>	 RECOVERY - BFD status on asw1-by27-esams.mgmt is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[07:30:12] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[07:30:13] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:30:29] <icinga-wm>	 RECOVERY - Bird Internet Routing Daemon on doh3003 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[07:31:03] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134990 (10MoritzMuehlenhoff)
[07:31:36] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:puppetserver::volatile generate datacenter database (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1181090 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede)
[07:31:38] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.hosts.decommission for hosts atlas3001.wikimedia.org
[07:31:57] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11134995 (10MoritzMuehlenhoff)
[07:31:58] <jinxer-wm>	 RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:32:17] <logmsgbot>	 !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts atlas3001.wikimedia.org
[07:34:01] <wikibugs>	 (03PS1) 10Ayounsi: Remove atlas3001 from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1183536 (https://phabricator.wikimedia.org/T402259)
[07:34:34] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ganeti3005 from esams01 cluster [puppet] - 10https://gerrit.wikimedia.org/r/1183544 (https://phabricator.wikimedia.org/T402259)
[07:35:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183536 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:35:07] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Remove atlas3001 from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1183536 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:35:32] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] profile:cache: remove varnishkafka (webrequest) from cp hosts [puppet] - 10https://gerrit.wikimedia.org/r/1183081 (https://phabricator.wikimedia.org/T393772) (owner: 10Fabfur)
[07:35:33] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183544 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[07:35:49] <logmsgbot>	 !log kartik@deploy1003 kartik, hueitan: Backport for [[gerrit:1182861|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:36:00] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[07:36:13] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] profile:cache: remove varnishkafka (webrequest) from cp hosts [puppet] - 10https://gerrit.wikimedia.org/r/1183081 (https://phabricator.wikimedia.org/T393772) (owner: 10Fabfur)
[07:36:22] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.hosts.decommission for hosts atlas3001.wikimedia.org
[07:37:49] <wikibugs>	 (03PS1) 10Slyngshede: P:puppetserver::volatile fix group name [puppet] - 10https://gerrit.wikimedia.org/r/1183599 (https://phabricator.wikimedia.org/T398161)
[07:38:51] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "Thx I think it's because we use the "set" output format for Nokia, which makes longer lines." [puppet] - 10https://gerrit.wikimedia.org/r/1183140 (owner: 10Muehlenhoff)
[07:39:36] <jinxer-wm>	 FIRING: ProbeDown: Ripe Atlas anchor atlas3001:80 is not returning HTTP 200 OK on port 80 - https://wikitech.wikimedia.org/wiki/RIPE_Atlas#HTTP_checks_failing - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:40:12] <logmsgbot>	 !log arnaudb@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on people1005.eqiad.wmnet with reason: WIP T402953#11120672
[07:40:15] <stashbot>	 T402953: SystemdUnitFailed - envoyproxy on people1005 - https://phabricator.wikimedia.org/T402953
[07:40:19] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[07:40:22] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] P:puppetserver::volatile fix group name [puppet] - 10https://gerrit.wikimedia.org/r/1183599 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede)
[07:40:33] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] P:puppetserver::volatile fix group name [puppet] - 10https://gerrit.wikimedia.org/r/1183599 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede)
[07:40:45] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:puppetserver::volatile fix group name [puppet] - 10https://gerrit.wikimedia.org/r/1183599 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede)
[07:41:14] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add esams routed ganeti VM ranges to network/data/data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1180083 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[07:43:31] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1183544 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[07:43:40] <jinxer-wm>	 FIRING: [2x] ProbeDown: Ripe Atlas anchor atlas3001:80 is not returning HTTP 200 OK on port 80 - https://wikitech.wikimedia.org/wiki/RIPE_Atlas#HTTP_checks_failing - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:44:29] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
[07:44:33] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
[07:44:33] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:44:34] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts atlas3001.wikimedia.org
[07:44:50] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135031 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by ayounsi@cumin1003 for hosts: `atlas3001.wikimedia.org` - atlas3001.wikimedia.org (**WA...
[07:47:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove ganeti3005 from esams01 cluster [puppet] - 10https://gerrit.wikimedia.org/r/1183544 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[07:48:28] <kart_>	 Sorry, testing is taking longer time..
[07:49:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[07:49:31] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti3005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[07:49:31] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti3005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[07:49:36] <jinxer-wm>	 FIRING: [3x] ProbeDown: Ripe Atlas anchor atlas3001:80 is not returning HTTP 200 OK on port 80  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:51:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Line-wrap Homer diffs [puppet] - 10https://gerrit.wikimedia.org/r/1183140 (owner: 10Muehlenhoff)
[07:51:56] <logmsgbot>	 !log kartik@deploy1003 kartik, hueitan: Continuing with sync
[07:54:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[07:54:21] <wikibugs>	 (03PS1) 10Elukey: role::maps: increase max-conns and shared buffers on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565)
[07:55:04] <wikibugs>	 (03PS1) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496)
[07:55:45] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update HomepageVisit schema to 1.6.1 [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[07:55:53] <wikibugs>	 (03PS1) 10Brouberol: postgresql-airflow-main: increase max CPU and disk space [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183611
[07:57:28] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135039 (10MoritzMuehlenhoff)
[07:57:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, September 02 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[07:59:39] <logmsgbot>	 !log kartik@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182861|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]] (duration: 29m 29s)
[07:59:42] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[08:00:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy1003 using scap backport" [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[08:01:09] <wikibugs>	 (03CR) 10Muehlenhoff: role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:02:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[08:02:17] <wikibugs>	 (03PS2) 10Elukey: role::maps: increase max-conns and shared buffers on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565)
[08:02:31] <Hamishcz>	 kart_: can we move forward soon? or I reschedule to next backport window
[08:02:42] <wikibugs>	 (03PS3) 10Ayounsi: esams: add Ganeti "customer" [homer/public] - 10https://gerrit.wikimedia.org/r/1180081 (https://phabricator.wikimedia.org/T402259)
[08:03:29] <kart_>	 Hamishcz: sorry, first patch is done, on the second patch but we're overtime.
[08:03:31] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bookworm
[08:03:40] <jinxer-wm>	 RESOLVED: [3x] ProbeDown: Ripe Atlas anchor atlas3001:80 is not returning HTTP 200 OK on port 80  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:03:51] <kart_>	 If there are nothing schedule, you can go ahead after my patch is done.
[08:04:25] <wikibugs>	 (03PS2) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496)
[08:05:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[08:05:27] <wikibugs>	 (03CR) 10David Caro: "> For the rest, how would the alerts/task notification to users work? I'm asking because I worry that unless that's fully automatic (i.e. " [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro)
[08:05:36] <wikibugs>	 (03PS1) 10Slyngshede: P:puppetserver::volatile enable datacenter timer [puppet] - 10https://gerrit.wikimedia.org/r/1183612 (https://phabricator.wikimedia.org/T398161)
[08:05:57] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] esams: add Ganeti "customer" [homer/public] - 10https://gerrit.wikimedia.org/r/1180081 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[08:06:42] <Hamishcz>	 okayyy i think I can wait
[08:06:57] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6804/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:07:03] <Hamishcz>	 but I cant deploy myself right? if my memory is correct
[08:07:23] <wikibugs>	 (03Merged) 10jenkins-bot: esams: add Ganeti "customer" [homer/public] - 10https://gerrit.wikimedia.org/r/1180081 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[08:07:47] <wikibugs>	 (03PS3) 10Elukey: role::maps: increase max-conns and shared buffers on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565)
[08:09:31] <wikibugs>	 (03Merged) 10jenkins-bot: Update HomepageVisit schema to 1.6.1 [extensions/GrowthExperiments] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1182862 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[08:09:47] <logmsgbot>	 !log kartik@deploy1003 Started scap sync-world: Backport for [[gerrit:1182862|Update HomepageVisit schema to 1.6.1 (T402496 T402497)]]
[08:09:52] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[08:09:52] <stashbot>	 T402497: Tracking code for Scenarios 2 for WE2.1.1 - https://phabricator.wikimedia.org/T402497
[08:10:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[08:10:29] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6805/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:13:12] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] acme-chief: Move clean-stale-certs to file [puppet] - 10https://gerrit.wikimedia.org/r/1174881 (https://phabricator.wikimedia.org/T399419) (owner: 10BCornwall)
[08:16:07] <logmsgbot>	 !log kartik@deploy1003 hueitan, kartik: Backport for [[gerrit:1182862|Update HomepageVisit schema to 1.6.1 (T402496 T402497)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:16:12] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[08:16:12] <stashbot>	 T402497: Tracking code for Scenarios 2 for WE2.1.1 - https://phabricator.wikimedia.org/T402497
[08:16:16] <wikibugs>	 (03CR) 10Elukey: [V:03+1] role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:17:04] <wikibugs>	 (03CR) 10Vgutierrez: "makes sense but I'm wondering if initially we could manage purge value via hiera son we can test on a few hosts before proceeding with a g" [puppet] - 10https://gerrit.wikimedia.org/r/1178597 (https://phabricator.wikimedia.org/T401858) (owner: 10JHathaway)
[08:17:09] <wikibugs>	 (03CR) 10Elukey: [V:03+1] role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:18:49] <wikibugs>	 (03CR) 10KartikMistry: Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 (owner: 10Jdlrobson)
[08:18:57] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11135094 (10phaultfinder)
[08:19:11] <wikibugs>	 (03CR) 10KartikMistry: [C:03+1] Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182944 (owner: 10Jdlrobson)
[08:19:45] <wikibugs>	 (03CR) 10KartikMistry: "Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152558 (https://phabricator.wikimedia.org/T380930) (owner: 10KartikMistry)
[08:20:01] <logmsgbot>	 !log kartik@deploy1003 hueitan, kartik: Continuing with sync
[08:20:20] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.upgrade for es2049.codfw.wmnet
[08:21:14] <wikibugs>	 (03PS4) 10Elukey: role::maps: increase max-conns and shared buffers on Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565)
[08:23:14] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (DIFF 2 CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:23:54] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11135108 (10phaultfinder)
[08:25:05] <wikibugs>	 (03CR) 10Elukey: [V:03+1] role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:25:09] <logmsgbot>	 !log kartik@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182862|Update HomepageVisit schema to 1.6.1 (T402496 T402497)]] (duration: 15m 21s)
[08:25:13] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[08:25:14] <stashbot>	 T402497: Tracking code for Scenarios 2 for WE2.1.1 - https://phabricator.wikimedia.org/T402497
[08:27:04] <wikibugs>	 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11135114 (10elukey) I am also seeing a lot of the following logs in various replicas:  ` GMT FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000...
[08:27:21] <logmsgbot>	 fceratto@cumin1002 upgrade (PID 3914791) is awaiting input
[08:28:02] <kart_>	 Hamishcz: I'm done.
[08:28:33] <Hamishcz>	 yea im still around but i'm in need of your help to deploy
[08:28:46] <dcausse>	 I can deploy
[08:29:00] <Hamishcz>	 i think i cant deploy myself due to lack of permission?
[08:29:30] <Hamishcz>	 dcausse: oh thank you
[08:29:59] <dcausse>	 next is Msz2001?
[08:30:13] <dcausse>	 jouncebot: nowandnext
[08:30:13] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 29 minute(s)
[08:30:14] <jouncebot>	 In 1 hour(s) and 29 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[08:30:42] <dcausse>	 Msz2001: are you around?
[08:30:42] <Msz2001>	 I've moved the patches to the next window, but if we have time and there's somebody to deploy them for me, let's do that
[08:31:32] <dcausse>	 Msz2001: ack, can I ship both at once?
[08:31:36] <Msz2001>	 YEs
[08:31:41] <dcausse>	 ok
[08:32:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182692 (https://phabricator.wikimedia.org/T403148) (owner: 10Mszwarc)
[08:32:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc)
[08:33:00] <Msz2001>	 (please note that the one about logo requires a script to be run afterwards, to purge the server-side cache; it's mentioned in the Phab task)
[08:33:07] <dcausse>	 ok
[08:33:40] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "wikimaniawiki: update logo to 2025" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182692 (https://phabricator.wikimedia.org/T403148) (owner: 10Mszwarc)
[08:33:42] <wikibugs>	 (03Merged) 10jenkins-bot: Remove setting `wgEnablePartialActionBlocks`. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182798 (https://phabricator.wikimedia.org/T280532) (owner: 10Mszwarc)
[08:33:58] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1182692|Revert "wikimaniawiki: update logo to 2025" (T403148)]], [[gerrit:1182798|Remove setting `wgEnablePartialActionBlocks`. (T280532)]]
[08:34:03] <stashbot>	 T403148: Change Wikimania wiki logo from 2025 to generic - https://phabricator.wikimedia.org/T403148
[08:34:03] <stashbot>	 T280532: Remove partial action blocks feature flag - https://phabricator.wikimedia.org/T280532
[08:36:55] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2049.codfw.wmnet
[08:39:35] <logmsgbot>	 !log dcausse@deploy1003 mszwarc, dcausse: Backport for [[gerrit:1182692|Revert "wikimaniawiki: update logo to 2025" (T403148)]], [[gerrit:1182798|Remove setting `wgEnablePartialActionBlocks`. (T280532)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:39:39] <stashbot>	 T403148: Change Wikimania wiki logo from 2025 to generic - https://phabricator.wikimedia.org/T403148
[08:39:40] <stashbot>	 T280532: Remove partial action blocks feature flag - https://phabricator.wikimedia.org/T280532
[08:40:06] <Msz2001>	 Both patches work fine
[08:40:11] <dcausse>	 Msz2001: ack
[08:40:52] <logmsgbot>	 !log dcausse@deploy1003 mszwarc, dcausse: Continuing with sync
[08:41:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[08:42:11] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
[08:42:24] <dcausse>	 kostajh: o/ are you around for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ConfirmEdit/+/1183112 ? 
[08:42:47] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[08:42:54] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82283 and previous config saved to /var/cache/conftool/dbconfig/20250901-084254-ladsgroup.json
[08:42:57] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[08:44:29] <Amir1>	 jouncebot: nowandnext
[08:44:29] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 15 minute(s)
[08:44:29] <jouncebot>	 In 1 hour(s) and 15 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[08:44:45] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] P:puppetserver::volatile enable datacenter timer [puppet] - 10https://gerrit.wikimedia.org/r/1183612 (https://phabricator.wikimedia.org/T398161) (owner: 10Slyngshede)
[08:45:52] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
[08:45:59] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82284 and previous config saved to /var/cache/conftool/dbconfig/20250901-084558-fceratto.json
[08:46:02] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[08:46:03] <logmsgbot>	 !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182692|Revert "wikimaniawiki: update logo to 2025" (T403148)]], [[gerrit:1182798|Remove setting `wgEnablePartialActionBlocks`. (T280532)]] (duration: 12m 05s)
[08:46:07] <stashbot>	 T403148: Change Wikimania wiki logo from 2025 to generic - https://phabricator.wikimedia.org/T403148
[08:46:08] <stashbot>	 T280532: Remove partial action blocks feature flag - https://phabricator.wikimedia.org/T280532
[08:46:21] <kostajh>	 dcausse: I'm around
[08:46:28] <dcausse>	 kostajh: ok
[08:46:59] <dcausse>	 Msz2001: running the maint script now
[08:47:13] <dcausse>	 purged, I can see the new logo now
[08:47:30] <Msz2001>	 Me too. Thanks for deploying!
[08:47:40] <dcausse>	 yw! :)
[08:48:33] <dcausse>	 kostajh: do you mind if I ship Hamishcz config patch quickly while yours run through CI?
[08:48:51] <dcausse>	 Hamishcz: are you still around?
[08:49:25] <Hamishcz>	 yes
[08:49:56] <dcausse>	 ok shipping your patch now
[08:51:00] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2049.codfw.wmnet with reason: T402859
[08:51:03] <stashbot>	 T402859: Productionize es2049-es2057 - https://phabricator.wikimedia.org/T402859
[08:51:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[08:51:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183279 (https://phabricator.wikimedia.org/T403350) (owner: 10Hamish)
[08:52:11] <wikibugs>	 (03Merged) 10jenkins-bot: Lift permission for event-organizer in Chinese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183279 (https://phabricator.wikimedia.org/T403350) (owner: 10Hamish)
[08:52:24] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1183279|Lift permission for event-organizer in Chinese Wikipedia (T403350)]]
[08:52:28] <stashbot>	 T403350: Lift permission for event-organizer in Chinese Wikipedia - https://phabricator.wikimedia.org/T403350
[08:52:47] <wikibugs>	 (03CR) 10DCausse: [C:03+2] hCaptcha: Provide label/help in authmanagerinfo API calls [extensions/ConfirmEdit] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183112 (https://phabricator.wikimedia.org/T403253) (owner: 10Kosta Harlan)
[08:56:17] <kostajh>	 dcausse: sorry for the late reply, yes, no problem
[08:56:27] <dcausse>	 np :)
[08:57:39] <wikibugs>	 (03CR) 10Muehlenhoff: role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[08:58:27] <logmsgbot>	 !log dcausse@deploy1003 hamishz, dcausse: Backport for [[gerrit:1183279|Lift permission for event-organizer in Chinese Wikipedia (T403350)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:58:29] <stashbot>	 T403350: Lift permission for event-organizer in Chinese Wikipedia - https://phabricator.wikimedia.org/T403350
[08:58:50] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135398 (10ayounsi)
[08:59:01] <dcausse>	 Hamishcz: it's on test servers, please let me know if everyting's OK
[08:59:20] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82285 and previous config saved to /var/cache/conftool/dbconfig/20250901-085920-fceratto.json
[08:59:22] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Ceph, 10envoy, 06serviceops: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374 (10MatthewVernon) 03NEW
[08:59:23] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[08:59:42] <Hamishcz>	 seems still not live on testserver?
[09:00:05] <dcausse>	 h
[09:00:09] <dcausse>	 hmm.. should be
[09:00:10] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] check_prometheus: add migration task param [puppet] - 10https://gerrit.wikimedia.org/r/1183126 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[09:00:22] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] monitoring services: add migration task T370153 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1183127 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[09:00:45] <Hamishcz>	 ah live now, checked and LGTM
[09:00:53] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] monitoring services: add migration task T309012 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1183128 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[09:01:04] <dcausse>	 Hamishcz: ack, shipping
[09:01:05] <wikibugs>	 (03CR) 10Elukey: [V:03+1] role::maps: increase max-conns and shared buffers on Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183609 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[09:01:07] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] monitoring services: add migration task T370157 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1183129 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[09:01:08] <Hamishcz>	 ty
[09:01:21] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] monitoring services: add migration task T315866 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1183130 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[09:01:34] <logmsgbot>	 !log dcausse@deploy1003 hamishz, dcausse: Continuing with sync
[09:01:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:04:28] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Varnish: Fix rate limit comment to match code [puppet] - 10https://gerrit.wikimedia.org/r/1183245 (https://phabricator.wikimedia.org/T400119) (owner: 10Pppery)
[09:04:57] <Emperor>	 !upgrade envoyproxy on ms-fe T403374
[09:04:58] <stashbot>	 T403374: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374
[09:05:44] <dcausse>	 jouncebot: nowandnext
[09:05:44] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 54 minute(s)
[09:05:44] <jouncebot>	 In 0 hour(s) and 54 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[09:05:57] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Provide label/help in authmanagerinfo API calls [extensions/ConfirmEdit] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183112 (https://phabricator.wikimedia.org/T403253) (owner: 10Kosta Harlan)
[09:06:44] <logmsgbot>	 !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183279|Lift permission for event-organizer in Chinese Wikipedia (T403350)]] (duration: 14m 20s)
[09:06:47] <stashbot>	 T403350: Lift permission for event-organizer in Chinese Wikipedia - https://phabricator.wikimedia.org/T403350
[09:07:13] <dcausse>	 Hamishcz: should be live
[09:07:33] <dcausse>	 kostajh: shipping your patch now
[09:07:44] <kostajh>	 dcausse: thanks!
[09:08:32] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1183112|hCaptcha: Provide label/help in authmanagerinfo API calls (T403253)]]
[09:08:35] <stashbot>	 T403253: TypeError: MediaWiki\Api\ApiAuthManagerHelper::formatMessage(): Argument #3 ($message) must be of type MediaWiki\Message\Message, null given, called in /srv/mediawiki/php-1.45.0-wmf.16/includes/api/ApiAuthManagerHelper.php on l - https://phabricator.wikimedia.org/T403253
[09:09:13] <Hamishcz>	 dcausse: okay now thank you :)
[09:09:21] <dcausse>	 yw! :)
[09:09:40] <wikibugs>	 10ops-esams, 06DC-Ops: ganeti3005 doesn't come back up during reimage - https://phabricator.wikimedia.org/T403375 (10MoritzMuehlenhoff) 03NEW
[09:10:10] <wikibugs>	 10ops-esams, 06DC-Ops: ganeti3005 doesn't come back up during reimage - https://phabricator.wikimedia.org/T403375#11135458 (10MoritzMuehlenhoff)
[09:10:13] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135459 (10MoritzMuehlenhoff)
[09:14:28] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82286 and previous config saved to /var/cache/conftool/dbconfig/20250901-091427-fceratto.json
[09:14:32] <logmsgbot>	 !log dcausse@deploy1003 kharlan, dcausse: Backport for [[gerrit:1183112|hCaptcha: Provide label/help in authmanagerinfo API calls (T403253)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:14:35] <stashbot>	 T403253: TypeError: MediaWiki\Api\ApiAuthManagerHelper::formatMessage(): Argument #3 ($message) must be of type MediaWiki\Message\Message, null given, called in /srv/mediawiki/php-1.45.0-wmf.16/includes/api/ApiAuthManagerHelper.php on l - https://phabricator.wikimedia.org/T403253
[09:14:53] <dcausse>	 kostajh: should be on testservers ready to test
[09:15:32] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82287 and previous config saved to /var/cache/conftool/dbconfig/20250901-091531-ladsgroup.json
[09:15:35] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[09:16:55] <kostajh>	 dcausse: looking 
[09:19:27] <kostajh>	 dcausse: lgtm
[09:19:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] wmcs: alert on nova agents unavailable [alerts] - 10https://gerrit.wikimedia.org/r/1182034 (https://phabricator.wikimedia.org/T402778) (owner: 10Filippo Giunchedi)
[09:19:36] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[09:19:37] <dcausse>	 kostajh: ack, shipping
[09:19:39] <logmsgbot>	 !log dcausse@deploy1003 kharlan, dcausse: Continuing with sync
[09:19:53] <wikibugs>	 10ops-esams, 06DC-Ops: esams: document power cables in Netbox - https://phabricator.wikimedia.org/T403376 (10ayounsi) 03NEW
[09:20:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] openstack: move nova-compute alerts to higher level [puppet] - 10https://gerrit.wikimedia.org/r/1182085 (https://phabricator.wikimedia.org/T402778) (owner: 10Filippo Giunchedi)
[09:21:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "Ok to go ahead and see how much busywork we'll self-inflict" [alerts] - 10https://gerrit.wikimedia.org/r/1182900 (https://phabricator.wikimedia.org/T402932) (owner: 10David Caro)
[09:22:25] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: rake: default to python3 [puppet] - 10https://gerrit.wikimedia.org/r/1122090 (owner: 10Filippo Giunchedi)
[09:22:26] <wikibugs>	 (03CR) 10Btullis: [C:03+1] postgresql-airflow-main: increase max CPU and disk space [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183611 (owner: 10Brouberol)
[09:22:32] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] postgresql-airflow-main: increase max CPU and disk space [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183611 (owner: 10Brouberol)
[09:22:37] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: grafana: set max_source_resolution=auto for thanos ds [puppet] - 10https://gerrit.wikimedia.org/r/1135948 (https://phabricator.wikimedia.org/T371102) (owner: 10Filippo Giunchedi)
[09:23:06] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] rest-gateway: route wikifeeds configuration endpoint. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182876 (https://phabricator.wikimedia.org/T403193) (owner: 10Dbrant)
[09:23:23] <dcausse>	 one more deploy after this one and we should be done
[09:24:23] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Ceph, 10envoy, 06serviceops: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374#11135496 (10MatthewVernon)
[09:24:48] <logmsgbot>	 !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183112|hCaptcha: Provide label/help in authmanagerinfo API calls (T403253)]] (duration: 16m 15s)
[09:24:48] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3005.esams.wmnet with OS bookworm
[09:24:50] <stashbot>	 T403253: TypeError: MediaWiki\Api\ApiAuthManagerHelper::formatMessage(): Argument #3 ($message) must be of type MediaWiki\Message\Message, null given, called in /srv/mediawiki/php-1.45.0-wmf.16/includes/api/ApiAuthManagerHelper.php on l - https://phabricator.wikimedia.org/T403253
[09:25:03] <dcausse>	 kostajh: should be live
[09:25:08] <wikibugs>	 (03Merged) 10jenkins-bot: rest-gateway: route wikifeeds configuration endpoint. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1182876 (https://phabricator.wikimedia.org/T403193) (owner: 10Dbrant)
[09:25:37] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
[09:25:41] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
[09:25:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183454 (https://phabricator.wikimedia.org/T401220) (owner: 10DCausse)
[09:26:50] <wikibugs>	 (03Merged) 10jenkins-bot: SECURITY: declare PoolCounter settings for cirrusbuilddoc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183454 (https://phabricator.wikimedia.org/T401220) (owner: 10DCausse)
[09:27:06] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1183454|SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)]]
[09:28:39] <wikibugs>	 (03PS1) 10JMeybohm: Update SSH key for conniecc1 [puppet] - 10https://gerrit.wikimedia.org/r/1183617 (https://phabricator.wikimedia.org/T403242)
[09:28:46] <kostajh>	 dcausse: thanks for deploying!
[09:29:17] <dcausse>	 yw :)
[09:29:35] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82288 and previous config saved to /var/cache/conftool/dbconfig/20250901-092934-fceratto.json
[09:29:36] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[09:29:39] <wikibugs>	 (03PS1) 10Slyngshede: P:cache::haproxy Allow user-agents with contact information [puppet] - 10https://gerrit.wikimedia.org/r/1183618 (https://phabricator.wikimedia.org/T400119)
[09:30:40] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P82289 and previous config saved to /var/cache/conftool/dbconfig/20250901-093039-ladsgroup.json
[09:32:10] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11135537 (10Vgutierrez) >>! In T400119#11134249, @Don-vip wrote: > I still have [[ https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/jobs/601049 |...
[09:33:04] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6807/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183618 (https://phabricator.wikimedia.org/T400119) (owner: 10Slyngshede)
[09:33:16] <logmsgbot>	 !log dcausse@deploy1003 dcausse: Backport for [[gerrit:1183454|SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:36:24] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183216 (https://phabricator.wikimedia.org/T402527) (owner: 10D3r1ck01)
[09:37:39] <hnowlan>	 jouncebot: nownandnext
[09:37:43] <hnowlan>	 jouncebot: nowandnext
[09:37:43] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 22 minute(s)
[09:37:43] <jouncebot>	 In 0 hour(s) and 22 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[09:37:58] <logmsgbot>	 !log dcausse@deploy1003 dcausse: Continuing with sync
[09:38:12] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] pdb_resource_exporter: add check_prometheus tasks query [puppet] - 10https://gerrit.wikimedia.org/r/1183131 (https://phabricator.wikimedia.org/T395442) (owner: 10Tiziano Fogli)
[09:38:16] <wikibugs>	 (03CR) 10D3r1ck01: "NOTE: Once this patch is deployed, it'll be a no-op until Ib5702b11b3ef642b6eda6e4c291c2fb670bc07f1 gets merged. This config patch only in" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183216 (https://phabricator.wikimedia.org/T402527) (owner: 10D3r1ck01)
[09:41:27] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[09:41:36] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[09:43:05] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[09:43:17] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[09:43:40] <dcausse>	 sigh I accentally hit Ctrl-C in scap, it was at "deployment progress:  87% (ok: 1963; fail: 0; left: 276)"
[09:44:15] <dcausse>	 can I just rerun scap or is there anything I should do?
[09:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[09:44:43] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82290 and previous config saved to /var/cache/conftool/dbconfig/20250901-094442-fceratto.json
[09:44:49] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[09:44:57] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
[09:45:05] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82291 and previous config saved to /var/cache/conftool/dbconfig/20250901-094504-fceratto.json
[09:45:47] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P82292 and previous config saved to /var/cache/conftool/dbconfig/20250901-094547-ladsgroup.json
[09:47:09] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[09:47:17] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[09:47:28] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1183454|SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)]]
[09:52:56] <logmsgbot>	 !log dcausse@deploy1003 dcausse: Backport for [[gerrit:1183454|SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:53:15] <logmsgbot>	 !log dcausse@deploy1003 dcausse: Continuing with sync
[09:55:03] <wikibugs>	 (03CR) 10Arendpieter: [C:03+1] SUL3: Use `metawiki` as central wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183216 (https://phabricator.wikimedia.org/T402527) (owner: 10D3r1ck01)
[09:55:27] <wikibugs>	 06SRE, 06Wikimedia Enterprise: Provide auth-less access to Enterprise APIs from WMF Analytics cluster - https://phabricator.wikimedia.org/T403298#11135615 (10JMeybohm) Please keep in mind that allowing the HTTP proxy IPs will ultimately allow Enterprise API access from all systems allowed to use the HTTP proxi...
[09:58:23] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82293 and previous config saved to /var/cache/conftool/dbconfig/20250901-095822-fceratto.json
[09:58:26] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[09:58:40] <logmsgbot>	 !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183454|SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)]] (duration: 11m 12s)
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[10:00:43] <wikibugs>	 (03PS1) 10Elukey: profile::base: Pin linux-base's version for Bookworm bpo [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948)
[10:00:56] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82294 and previous config saved to /var/cache/conftool/dbconfig/20250901-100054-ladsgroup.json
[10:00:59] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[10:01:12] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[10:01:26] <wikibugs>	 (03PS1) 10Volans: insetup: fix report recipients [puppet] - 10https://gerrit.wikimedia.org/r/1183622
[10:04:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[10:04:43] <Emperor>	 !upgrade envoyproxy on thanos-fe T403374
[10:04:44] <stashbot>	 T403374: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374
[10:05:25] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
[10:05:35] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Update SSH key for conniecc1 [puppet] - 10https://gerrit.wikimedia.org/r/1183617 (https://phabricator.wikimedia.org/T403242) (owner: 10JMeybohm)
[10:06:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[10:06:54] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: use admin to manage swift uid/gid, remove old bodges [puppet] - 10https://gerrit.wikimedia.org/r/1182573 (https://phabricator.wikimedia.org/T123918) (owner: 10MVernon)
[10:07:19] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[10:07:57] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Update SSH key for Connie Chen - https://phabricator.wikimedia.org/T403242#11135656 (10JMeybohm) 05Open→03Resolved a:03JMeybohm Hi @cchen, I've updated your SSH key according to this request.
[10:09:49] <wikibugs>	 (03PS2) 10Elukey: profile::base: Pin linux-base's version for Bookworm bpo [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948)
[10:09:57] <wikibugs>	 (03PS1) 10Btullis: Remove old hadoop workers from the exclusion list [puppet] - 10https://gerrit.wikimedia.org/r/1183624 (https://phabricator.wikimedia.org/T397166)
[10:10:37] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6809/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[10:11:23] <wikibugs>	 06SRE, 10SRE-swift-storage, 13Patch-For-Review: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918#11135667 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon All done!
[10:12:12] <wikibugs>	 (03PS1) 10David Caro: test_cookbook.py: fix typo in the help to log dir [puppet] - 10https://gerrit.wikimedia.org/r/1183625
[10:12:43] <wikibugs>	 (03CR) 10Volans: [C:03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1183625 (owner: 10David Caro)
[10:13:28] <wikibugs>	 (03CR) 10David Caro: [C:03+2] test_cookbook.py: fix typo in the help to log dir [puppet] - 10https://gerrit.wikimedia.org/r/1183625 (owner: 10David Caro)
[10:13:30] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82295 and previous config saved to /var/cache/conftool/dbconfig/20250901-101330-fceratto.json
[10:13:45] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Ceph, 10envoy, 06serviceops: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374#11135684 (10MatthewVernon)
[10:14:57] <wikibugs>	 (03CR) 10Ladsgroup: "recheck" [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[10:15:28] <wikibugs>	 (03PS1) 10Muehlenhoff: postgresql.postgres-init: Ensure that it's run from within a screen session [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626
[10:16:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Remove old hadoop workers from the exclusion list [puppet] - 10https://gerrit.wikimedia.org/r/1183624 (https://phabricator.wikimedia.org/T397166) (owner: 10Btullis)
[10:16:36] <wikibugs>	 (03CR) 10Volans: postgresql.postgres-init: Ensure that it's run from within a screen session (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626 (owner: 10Muehlenhoff)
[10:18:05] <wikibugs>	 (03PS2) 10Muehlenhoff: postgresql.postgres-init: Ensure that it's run from within a screen session [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626
[10:18:09] <wikibugs>	 (03PS6) 10Cyndywikime: [Growth] enwiki: Deploy "Add a link" to 100% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1179648 (https://phabricator.wikimedia.org/T395524)
[10:18:30] <wikibugs>	 (03CR) 10Cyndywikime: "This patch is ready for review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1179648 (https://phabricator.wikimedia.org/T395524) (owner: 10Cyndywikime)
[10:18:33] <wikibugs>	 (03CR) 10Muehlenhoff: postgresql.postgres-init: Ensure that it's run from within a screen session (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626 (owner: 10Muehlenhoff)
[10:18:42] <wikibugs>	 (03PS1) 10MVernon: swift: remove 3 drained eqiad nodes for disk controller swap [puppet] - 10https://gerrit.wikimedia.org/r/1183627 (https://phabricator.wikimedia.org/T400877)
[10:18:44] <wikibugs>	 (03PS1) 10MVernon: swift: re-add 3 nodes, drain the next 3 [puppet] - 10https://gerrit.wikimedia.org/r/1183628 (https://phabricator.wikimedia.org/T400877)
[10:20:09] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Remove old hadoop workers from the exclusion list [puppet] - 10https://gerrit.wikimedia.org/r/1183624 (https://phabricator.wikimedia.org/T397166) (owner: 10Btullis)
[10:20:11] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626 (owner: 10Muehlenhoff)
[10:22:01] <Emperor>	 !upgrade envoyproxy on apus frontends T403374
[10:22:01] <stashbot>	 T403374: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374
[10:24:58] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] swift: re-add 3 nodes, drain the next 3 [puppet] - 10https://gerrit.wikimedia.org/r/1183628 (https://phabricator.wikimedia.org/T400877) (owner: 10MVernon)
[10:25:09] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] swift: remove 3 drained eqiad nodes for disk controller swap [puppet] - 10https://gerrit.wikimedia.org/r/1183627 (https://phabricator.wikimedia.org/T400877) (owner: 10MVernon)
[10:28:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] postgresql.postgres-init: Ensure that it's run from within a screen session [cookbooks] - 10https://gerrit.wikimedia.org/r/1183626 (owner: 10Muehlenhoff)
[10:28:38] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82296 and previous config saved to /var/cache/conftool/dbconfig/20250901-102837-fceratto.json
[10:28:45] <wikibugs>	 (03PS2) 10MVernon: swift: remove 3 drained eqiad nodes for disk controller swap [puppet] - 10https://gerrit.wikimedia.org/r/1183627 (https://phabricator.wikimedia.org/T400877)
[10:28:45] <wikibugs>	 (03PS2) 10MVernon: swift: re-add 3 nodes, drain the next 3 [puppet] - 10https://gerrit.wikimedia.org/r/1183628 (https://phabricator.wikimedia.org/T400877)
[10:29:00] <wikibugs>	 (03PS3) 10Elukey: profile::base: Pin linux-base's version for Bookworm bpo [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948)
[10:29:28] <wikibugs>	 (03PS1) 10Ladsgroup: common.yaml: Remove two more dropped tables from the list of private [puppet] - 10https://gerrit.wikimedia.org/r/1183629 (https://phabricator.wikimedia.org/T398945)
[10:29:49] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6811/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[10:30:17] <wikibugs>	 (03PS2) 10Ladsgroup: common.yaml: Remove two more dropped tables from the list of private [puppet] - 10https://gerrit.wikimedia.org/r/1183629 (https://phabricator.wikimedia.org/T398945)
[10:30:27] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] "Doesn't exist in production" [puppet] - 10https://gerrit.wikimedia.org/r/1183629 (https://phabricator.wikimedia.org/T398945) (owner: 10Ladsgroup)
[10:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[10:33:48] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183622 (owner: 10Volans)
[10:34:05] <wikibugs>	 (03PS1) 10Btullis: Move the fifth hadoop journalnode [puppet] - 10https://gerrit.wikimedia.org/r/1183630 (https://phabricator.wikimedia.org/T397166)
[10:36:25] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: remove 3 drained eqiad nodes for disk controller swap [puppet] - 10https://gerrit.wikimedia.org/r/1183627 (https://phabricator.wikimedia.org/T400877) (owner: 10MVernon)
[10:36:54] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Novem Linguae - https://phabricator.wikimedia.org/T403336#11135721 (10JMeybohm)
[10:38:36] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6813/console" [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[10:39:20] <wikibugs>	 (03CR) 10Zabe: "the failure is fixed by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CategoryTree/+/1182550 and https://gerrit.wikimedia.org/r/c/" [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[10:39:20] <wikibugs>	 (03PS4) 10Elukey: profile::base: Pin linux-base's version for Bookworm bpo [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948)
[10:39:38] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Novem Linguae - https://phabricator.wikimedia.org/T403336#11135739 (10JMeybohm) @Dreamy_Jazz please sign off your sponsor role  @Milimetric || @Ahoelzl || @Ottomata please sign off for `analytics-privatedata-users` access
[10:39:47] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Novem Linguae - https://phabricator.wikimedia.org/T403336#11135740 (10Dreamy_Jazz) I confirm that I am sponsoring this request.
[10:40:13] <wikibugs>	 (03PS3) 10Abijeet Patro: CentralNotice banner experiment WE2.1.1 - Add missing extension config [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[10:40:36] <wikibugs>	 (03CR) 10Elukey: [C:03+1] ":(" [puppet] - 10https://gerrit.wikimedia.org/r/1183622 (owner: 10Volans)
[10:41:53] <wikibugs>	 (03CR) 10Ladsgroup: "Yeah, noticed. Now thinking whether I should band-aid it until wmf.16 rolls out." [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[10:42:45] <wikibugs>	 (03CR) 10Ladsgroup: "wmf.17" [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[10:43:29] <wikibugs>	 (03CR) 10Volans: [C:03+2] insetup: fix report recipients [puppet] - 10https://gerrit.wikimedia.org/r/1183622 (owner: 10Volans)
[10:43:32] <wikibugs>	 (03PS1) 10Ladsgroup: ParserTestRunner: Update category counts for articles [core] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183632 (https://phabricator.wikimedia.org/T365303)
[10:43:45] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82297 and previous config saved to /var/cache/conftool/dbconfig/20250901-104345-fceratto.json
[10:43:48] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[10:44:00] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
[10:44:08] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82298 and previous config saved to /var/cache/conftool/dbconfig/20250901-104407-fceratto.json
[10:44:22] <Amir1>	 jouncebot: nowandnext
[10:44:22] <jouncebot>	 For the next 0 hour(s) and 15 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1000)
[10:44:22] <jouncebot>	 In 2 hour(s) and 15 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1300)
[10:44:44] <wikibugs>	 (03PS1) 10Ladsgroup: CategoryCacheTest: Update category count [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183633
[10:45:06] <wikibugs>	 (03PS2) 10Ladsgroup: Drop support for categorylinks read old [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951)
[10:45:21] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Ceph, 10envoy, 06serviceops: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374#11135763 (10MatthewVernon)
[10:45:29] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Ceph, 10envoy, 06serviceops: Data-persistence envoy upgrades to 1.26.8-1 - https://phabricator.wikimedia.org/T403374#11135764 (10MatthewVernon) 05Open→03Resolved
[10:45:34] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] ParserTestRunner: Update category counts for articles [core] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183632 (https://phabricator.wikimedia.org/T365303) (owner: 10Ladsgroup)
[10:45:53] <moritzm>	 !log installing luajit security updates
[10:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:06] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] CategoryCacheTest: Update category count [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183633 (owner: 10Ladsgroup)
[10:47:11] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Drop support for categorylinks read old [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[10:49:04] <wikibugs>	 (03CR) 10Abijeet Patro: [C:03+1] CentralNotice banner experiment WE2.1.1 - Add missing extension config [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[10:49:40] <wikibugs>	 (03PS2) 10Btullis: Move the fifth hadoop journalnode [puppet] - 10https://gerrit.wikimedia.org/r/1183630 (https://phabricator.wikimedia.org/T397166)
[10:50:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Move the fifth hadoop journalnode [puppet] - 10https://gerrit.wikimedia.org/r/1183630 (https://phabricator.wikimedia.org/T397166) (owner: 10Btullis)
[10:50:36] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Move the fifth hadoop journalnode [puppet] - 10https://gerrit.wikimedia.org/r/1183630 (https://phabricator.wikimedia.org/T397166) (owner: 10Btullis)
[10:50:48] <wikibugs>	 06SRE, 06Traffic, 10Wikidata, 10Wikidata-Query-Service: Find a solution for SPARQL federation that is blocked by stricter user agent policy enforcement - https://phabricator.wikimedia.org/T402959#11135769 (10Lydia_Pintscher) >>! In T402959#11132802, @CDanis wrote: > Hi @Lydia_Pintscher , SRE can make some...
[10:53:38] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11135773 (10Don-vip) >>! In T400119#11135537, @Vgutierrez wrote: > It looks like that test for some reason is using the default UA of the HttpClient librar...
[10:57:25] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82299 and previous config saved to /var/cache/conftool/dbconfig/20250901-105724-fceratto.json
[10:57:28] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[11:01:22] <wikibugs>	 (03Merged) 10jenkins-bot: ParserTestRunner: Update category counts for articles [core] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183632 (https://phabricator.wikimedia.org/T365303) (owner: 10Ladsgroup)
[11:01:25] <wikibugs>	 (03Merged) 10jenkins-bot: CategoryCacheTest: Update category count [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183633 (owner: 10Ladsgroup)
[11:01:26] <wikibugs>	 (03Merged) 10jenkins-bot: Drop support for categorylinks read old [extensions/CategoryTree] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183269 (https://phabricator.wikimedia.org/T299951) (owner: 10Ladsgroup)
[11:03:56] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1183632|ParserTestRunner: Update category counts for articles (T365303)]], [[gerrit:1183633|CategoryCacheTest: Update category count]], [[gerrit:1183269|Drop support for categorylinks read old (T299951 T403147 T403337)]]
[11:04:05] <stashbot>	 T365303: Move update of category members count to CategoryMembershipChangeJob - https://phabricator.wikimedia.org/T365303
[11:04:06] <stashbot>	 T299951: Normalize categorylinks table - https://phabricator.wikimedia.org/T299951
[11:04:06] <stashbot>	 T403147: Wikimedia\Rdbms\DBQueryError: Error 1054: Unknown column 'cl_to' in 'ON'Function: MediaWiki\Extension\CategoryTree\CategoryTree::renderChildrenQuery: SELECT  page_id,page_namespace,page_title,page_is_redirect,page_len,page_la - https://phabricator.wikimedia.org/T403147
[11:04:06] <stashbot>	 T403337: Wikimedia\Rdbms\DBQueryError: Inaccessible page in the Project_talk namespace on jawiki - https://phabricator.wikimedia.org/T403337
[11:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[11:09:48] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1183632|ParserTestRunner: Update category counts for articles (T365303)]], [[gerrit:1183633|CategoryCacheTest: Update category count]], [[gerrit:1183269|Drop support for categorylinks read old (T299951 T403147 T403337)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[11:09:55] <stashbot>	 T365303: Move update of category members count to CategoryMembershipChangeJob - https://phabricator.wikimedia.org/T365303
[11:09:56] <stashbot>	 T299951: Normalize categorylinks table - https://phabricator.wikimedia.org/T299951
[11:09:56] <stashbot>	 T403147: Wikimedia\Rdbms\DBQueryError: Error 1054: Unknown column 'cl_to' in 'ON'Function: MediaWiki\Extension\CategoryTree\CategoryTree::renderChildrenQuery: SELECT  page_id,page_namespace,page_title,page_is_redirect,page_len,page_la - https://phabricator.wikimedia.org/T403147
[11:09:56] <stashbot>	 T403337: Wikimedia\Rdbms\DBQueryError: Inaccessible page in the Project_talk namespace on jawiki - https://phabricator.wikimedia.org/T403337
[11:11:14] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with sync
[11:12:33] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82300 and previous config saved to /var/cache/conftool/dbconfig/20250901-111232-fceratto.json
[11:12:33] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:16:24] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183632|ParserTestRunner: Update category counts for articles (T365303)]], [[gerrit:1183633|CategoryCacheTest: Update category count]], [[gerrit:1183269|Drop support for categorylinks read old (T299951 T403147 T403337)]] (duration: 12m 28s)
[11:16:32] <stashbot>	 T365303: Move update of category members count to CategoryMembershipChangeJob - https://phabricator.wikimedia.org/T365303
[11:16:32] <stashbot>	 T299951: Normalize categorylinks table - https://phabricator.wikimedia.org/T299951
[11:16:33] <stashbot>	 T403147: Wikimedia\Rdbms\DBQueryError: Error 1054: Unknown column 'cl_to' in 'ON'Function: MediaWiki\Extension\CategoryTree\CategoryTree::renderChildrenQuery: SELECT  page_id,page_namespace,page_title,page_is_redirect,page_len,page_la - https://phabricator.wikimedia.org/T403147
[11:16:33] <stashbot>	 T403337: Wikimedia\Rdbms\DBQueryError: Inaccessible page in the Project_talk namespace on jawiki - https://phabricator.wikimedia.org/T403337
[11:17:33] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:18:37] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135896 (10MoritzMuehlenhoff) Since ganeti3005 has hardware issues which will take some time to resolve, we'll proceed with the second cluster in a similar manner...
[11:20:55] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
[11:21:38] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
[11:21:51] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135899 (10ops-monitoring-bot) Draining ganeti3006.esams.wmnet of running VMs
[11:22:46] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
[11:22:55] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[11:24:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of install3003.wikimedia.org to plain
[11:24:55] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135914 (10ops-monitoring-bot) VM install3003.wikimedia.org switching disk type to plain
[11:25:31] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install3003.wikimedia.org to plain
[11:25:33] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
[11:26:00] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11135918 (10MatthewVernon)
[11:26:16] <logmsgbot>	 !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[11:27:16] <logmsgbot>	 !log mvernon@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be[1083-1085].eqiad.wmnet with reason: awaiting controller swap
[11:27:28] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11135919 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5d9cb26e-171b-4940-aeef-3b79dd0f568e) set by mvernon@cu...
[11:27:40] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82301 and previous config saved to /var/cache/conftool/dbconfig/20250901-112739-fceratto.json
[11:28:26] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11135921 (10MatthewVernon) @VRiley-WMF three nodes - ms-be1083 ms-be1084 ms-be1085 are now ready for disk swaps, as soon as you've s...
[11:30:05] <wikibugs>	 (03PS1) 10Ayounsi: esams: add includes for routed ganeti ranges [dns] - 10https://gerrit.wikimedia.org/r/1183638 (https://phabricator.wikimedia.org/T402259)
[11:30:13] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: PuppetPendingCertificateRequest (instance puppetmaster1001:9100) - https://phabricator.wikimedia.org/T403388 (10LSobanski) 03NEW
[11:30:49] <wikibugs>	 (03CR) 10CI reject: [V:04-1] esams: add includes for routed ganeti ranges [dns] - 10https://gerrit.wikimedia.org/r/1183638 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[11:32:52] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[11:36:02] <logmsgbot>	 !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[11:36:25] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[11:37:35] <wikibugs>	 (03PS1) 10D3r1ck01: Add caller to maintenance script SQL queries [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900)
[11:38:08] <wikibugs>	 (03PS2) 10D3r1ck01: Add caller to maintenance script SQL queries [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900)
[11:39:19] <wikibugs>	 (03PS1) 10Volans: data.yaml: add my ecdsa-sk SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/1183641
[11:40:04] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams v4 routed ganeti IPs - ayounsi@cumin1003"
[11:40:09] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams v4 routed ganeti IPs - ayounsi@cumin1003"
[11:40:09] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:40:12] <wikibugs>	 (03CR) 10Ayounsi: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/1183638 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[11:40:18] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): CentralNotice banner experiment WE2.1.1 - Add missing extension config (031 comment) [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[11:41:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir3004.esams.wmnet to plain
[11:41:39] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135956 (10ops-monitoring-bot) VM ncredir3004.esams.wmnet switching disk type to plain
[11:41:53] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir3004.esams.wmnet to plain
[11:42:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82302 and previous config saved to /var/cache/conftool/dbconfig/20250901-114247-fceratto.json
[11:42:51] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[11:43:03] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
[11:43:11] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82303 and previous config saved to /var/cache/conftool/dbconfig/20250901-114310-fceratto.json
[11:43:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[11:43:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "This matches the networks currently configured in Netbox" [dns] - 10https://gerrit.wikimedia.org/r/1183638 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[11:43:45] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] esams: add includes for routed ganeti ranges [dns] - 10https://gerrit.wikimedia.org/r/1183638 (https://phabricator.wikimedia.org/T402259) (owner: 10Ayounsi)
[11:44:00] <logmsgbot>	 !log ayounsi@dns1004 START - running authdns-update
[11:44:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183641 (owner: 10Volans)
[11:45:07] <wikibugs>	 (03PS2) 10Ladsgroup: Stop writing to cl_to and cl_collation on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181720 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe)
[11:45:16] <logmsgbot>	 !log ayounsi@dns1004 END - running authdns-update
[11:45:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of durum3004.esams.wmnet to plain
[11:45:51] <Amir1>	 jouncebot: nowandnext
[11:45:51] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 14 minute(s)
[11:45:51] <jouncebot>	 In 1 hour(s) and 14 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1300)
[11:45:59] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135967 (10ops-monitoring-bot) VM durum3004.esams.wmnet switching disk type to plain
[11:46:09] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Stop writing to cl_to and cl_collation on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181720 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe)
[11:46:14] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum3004.esams.wmnet to plain
[11:46:23] <wikibugs>	 (03PS1) 10Btullis: Use the standby analytics_meta mariadb server temporarily [puppet] - 10https://gerrit.wikimedia.org/r/1183642 (https://phabricator.wikimedia.org/T394498)
[11:46:25] <wikibugs>	 (03CR) 10Volans: [C:03+2] data.yaml: add my ecdsa-sk SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/1183641 (owner: 10Volans)
[11:46:29] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135968 (10ayounsi)
[11:46:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181720 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe)
[11:46:57] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cl_to and cl_collation on commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1181720 (https://phabricator.wikimedia.org/T399579) (owner: 10Zabe)
[11:47:11] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1181720|Stop writing to cl_to and cl_collation on commonswiki (T399579)]]
[11:47:18] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
[11:47:19] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Whether this actually needs backporting depends on how soon you want to run the maintenance script for T398177 again, I guess… right now i" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[11:47:21] <stashbot>	 T399579: Stop writing to cl_to and cl_collation - https://phabricator.wikimedia.org/T399579
[11:47:26] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82304 and previous config saved to /var/cache/conftool/dbconfig/20250901-114725-ladsgroup.json
[11:47:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of doh3004.wikimedia.org to plain
[11:47:28] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[11:48:16] <icinga-wm>	 PROBLEM - BFD status on asw1-bw27-esams.mgmt is CRITICAL: Down: 4 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[11:48:16] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11135995 (10ops-monitoring-bot) VM doh3004.wikimedia.org switching disk type to plain
[11:48:31] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh3004.wikimedia.org to plain
[11:48:38] <icinga-wm>	 PROBLEM - Bird Internet Routing Daemon on durum3004 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[11:49:38] <icinga-wm>	 RECOVERY - Bird Internet Routing Daemon on durum3004 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running
[11:50:39] <wikibugs>	 (03CR) 10D3r1ck01: "Except we don't intend to run soonish, then we can just wait until the master changes rollout before we re-run the script again." [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[11:51:23] <wikibugs>	 (03PS1) 10Btullis: Facilitate a role swap between an-mariadb1001 and an-mariadb1002 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183643 (https://phabricator.wikimedia.org/T394498)
[11:52:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of netflow3003.esams.wmnet to plain
[11:52:16] <icinga-wm>	 RECOVERY - BFD status on asw1-bw27-esams.mgmt is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[11:52:43] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11136018 (10ops-monitoring-bot) VM netflow3003.esams.wmnet switching disk type to plain
[11:52:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow3003.esams.wmnet to plain
[11:53:12] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup, zabe: Backport for [[gerrit:1181720|Stop writing to cl_to and cl_collation on commonswiki (T399579)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[11:53:15] <stashbot>	 T399579: Stop writing to cl_to and cl_collation - https://phabricator.wikimedia.org/T399579
[11:54:13] <wikibugs>	 (03PS4) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496)
[11:54:16] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup, zabe: Continuing with sync
[11:54:37] <wikibugs>	 (03PS5) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496)
[11:54:50] <wikibugs>	 (03CR) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 (031 comment) [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[11:55:21] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus3003.esams.wmnet to plain
[11:55:41] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating esams to routed Ganeti - https://phabricator.wikimedia.org/T402259#11136029 (10ops-monitoring-bot) VM prometheus3003.esams.wmnet switching disk type to plain
[11:55:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus3003.esams.wmnet to plain
[11:56:49] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82305 and previous config saved to /var/cache/conftool/dbconfig/20250901-115649-fceratto.json
[11:56:52] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[11:58:38] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti3006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:58:54] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti3006 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[11:58:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ganeti3006 from ganeti02 cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183645 (https://phabricator.wikimedia.org/T402259)
[11:59:18] <wikibugs>	 (03PS2) 10Btullis: Facilitate a role swap between an-mariadb1001 and an-mariadb1002 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183643 (https://phabricator.wikimedia.org/T394498)
[11:59:26] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1181720|Stop writing to cl_to and cl_collation on commonswiki (T399579)]] (duration: 12m 15s)
[11:59:29] <stashbot>	 T399579: Stop writing to cl_to and cl_collation - https://phabricator.wikimedia.org/T399579
[11:59:36] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti3006:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:59:47] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove ganeti3006 from ganeti02 cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183645 (https://phabricator.wikimedia.org/T402259)
[12:01:06] <wikibugs>	 (03PS3) 10Btullis: Facilitate a role swap between an-mariadb1001 and an-mariadb1002 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183643 (https://phabricator.wikimedia.org/T394498)
[12:02:03] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183645 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[12:02:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 2.035s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:11:57] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82306 and previous config saved to /var/cache/conftool/dbconfig/20250901-121156-fceratto.json
[12:12:00] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be1091 is CRITICAL: CRITICAL - load average: 106.81, 100.94, 71.84 https://wikitech.wikimedia.org/wiki/Swift
[12:13:09] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Remove ganeti3006 from ganeti02 cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183645 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[12:14:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove ganeti3006 from ganeti02 cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183645 (https://phabricator.wikimedia.org/T402259) (owner: 10Muehlenhoff)
[12:21:01] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be1091 is OK: OK - load average: 71.53, 79.46, 74.60 https://wikitech.wikimedia.org/wiki/Swift
[12:22:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.216s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:22:42] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3006.esams.wmnet
[12:22:46] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.3s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:23:03] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti3006.esams.wmnet
[12:23:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3006.esams.wmnet
[12:23:52] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11136156 (10phaultfinder)
[12:25:12] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Facilitate a role swap between an-mariadb1001 and an-mariadb1002 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183643 (https://phabricator.wikimedia.org/T394498) (owner: 10Btullis)
[12:27:05] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82307 and previous config saved to /var/cache/conftool/dbconfig/20250901-122704-fceratto.json
[12:27:31] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.3s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:27:55] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] "Beautifully explained, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[12:29:00] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11136178 (10phaultfinder)
[12:31:31] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "I would prefer to run it soon, so that I can finish this and focus on something else." [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[12:32:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.133s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:32:49] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: FixRenameUserLocalLogs: Batch more queries to speed up the script [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183653 (https://phabricator.wikimedia.org/T398177)
[12:33:06] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183654 (https://phabricator.wikimedia.org/T398177)
[12:33:40] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti3006:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:34:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[12:35:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183653 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[12:35:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183654 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[12:38:09] <wikibugs>	 (03CR) 10D3r1ck01: "Ack!" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[12:41:08] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, September 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155805 (https://phabricator.wikimedia.org/T396347) (owner: 10Huji)
[12:42:12] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82308 and previous config saved to /var/cache/conftool/dbconfig/20250901-124211-fceratto.json
[12:42:15] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[12:42:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.587s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:42:17] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
[12:42:24] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82309 and previous config saved to /var/cache/conftool/dbconfig/20250901-124223-fceratto.json
[12:43:12] <logmsgbot>	 jmm@cumin2002 upgrade-firmware (PID 4105560) is awaiting input
[12:46:02] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1183618 (https://phabricator.wikimedia.org/T400119) (owner: 10Slyngshede)
[12:47:52] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82310 and previous config saved to /var/cache/conftool/dbconfig/20250901-124751-ladsgroup.json
[12:47:55] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[12:52:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[12:52:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
[12:56:03] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82311 and previous config saved to /var/cache/conftool/dbconfig/20250901-125602-fceratto.json
[12:56:08] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[12:56:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.666s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[12:57:00] <wikibugs>	 (03CR) 10Volans: [C:03+1] "No blockers for me. I'll leave it to you." [cookbooks] - 10https://gerrit.wikimedia.org/r/1181795 (owner: 10JHathaway)
[13:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: gettimeofday() says it's time for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1300)
[13:00:05] <jouncebot>	 huji, hueitan, xSavitar, and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:42] <Lucas_WMDE>	 o/
[13:00:46] <hueitan>	 o/
[13:01:09] <xSavitar>	 o/
[13:01:14] <Lucas_WMDE>	 I can deploy
[13:01:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.044s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:01:24] <Lucas_WMDE>	 (though I have a meeting in exactly one hour so I can’t go over the window today)
[13:01:27] <Lucas_WMDE>	 let’s start with hueitan 
[13:01:27] <kart_>	 Lucas_WMDE: you can deploy huji's patch, then I can go with hueitan's patch or you can deploy that as well.
[13:01:33] <MatmaRex>	 hi
[13:01:36] <kart_>	 oh good.
[13:01:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:01:41] <Lucas_WMDE>	 oh, sorry, i meant huji actually
[13:01:46] <Lucas_WMDE>	 as that’s the config change
[13:01:48] <Lucas_WMDE>	 assuming they’re around
[13:02:04] <kart_>	 seems not here?
[13:02:10] <Lucas_WMDE>	 let’s start with xSavitar then
[13:02:15] <Lucas_WMDE>	 and let the backport gate-and-submit in the background
[13:02:29] <xSavitar>	 Ack
[13:02:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183216 (https://phabricator.wikimedia.org/T402527) (owner: 10D3r1ck01)
[13:03:00] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P82312 and previous config saved to /var/cache/conftool/dbconfig/20250901-130259-ladsgroup.json
[13:03:37] <wikibugs>	 (03Merged) 10jenkins-bot: SUL3: Use `metawiki` as central wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183216 (https://phabricator.wikimedia.org/T402527) (owner: 10D3r1ck01)
[13:03:40] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[13:03:50] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1183216|SUL3: Use `metawiki` as central wiki (T402527)]]
[13:03:53] <stashbot>	 T402527: Stop using loginwiki during SUL3 central login - https://phabricator.wikimedia.org/T402527
[13:03:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
[13:04:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti3006.esams.wmnet
[13:04:28] <xSavitar>	 Lucas_WMDE, nothing to test, so you can sync when it's ready.
[13:04:36] <Lucas_WMDE>	 yup, ok
[13:04:58] <hujihuji>	 Hi Lucas_WMDE
[13:05:21] <hujihuji>	 Here for patch 1155805 
[13:05:28] <Lucas_WMDE>	 hi!
[13:05:34] <wikibugs>	 (03Merged) 10jenkins-bot: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183610 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[13:05:41] <Lucas_WMDE>	 oh wow that was a fast merge
[13:05:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Blacklist jffs2 [puppet] - 10https://gerrit.wikimedia.org/r/1183072 (owner: 10Muehlenhoff)
[13:06:01] <Lucas_WMDE>	 ok then it’s hueitan next (once the current config change is done), then hujihuji
[13:06:02] <hujihuji>	 It's been a while since I helped with a deployment so I need you to point me to the browser extension that helps me connec tto the specific node that you are deploying to
[13:06:10] <Lucas_WMDE>	 https://wikitech.wikimedia.org/wiki/WikimediaDebug :)
[13:06:19] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::base: Pin linux-base's version for Bookworm bpo [puppet] - 10https://gerrit.wikimedia.org/r/1183621 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:06:36] <Lucas_WMDE>	 and nowadays you don’t need to pick a specific server anymore, k8s-mwdebug should be enough
[13:06:45] <Lucas_WMDE>	 (you might remember being asked to pick mwdebug1002, or mwdebug2002, or etc.)
[13:07:29] <hujihuji>	 Yes, I am that old ;)
[13:07:42] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 d3r1ck01, lucaswerkmeister-wmde: Backport for [[gerrit:1183216|SUL3: Use `metawiki` as central wiki (T402527)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:07:51] <hujihuji>	 Seems like you are working with hueitan first, let me take a quick second to install that extension, brb
[13:08:09] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
[13:08:31] <hujihuji>	 ok, all set
[13:08:44] <MatmaRex>	 i'm away for a bit, i should be back before it's my turn
[13:08:57] <Lucas_WMDE>	 ok
[13:09:00] <wikibugs>	 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11136351 (10elukey) To keep archives happy: Moritz is re-initializing all the maps-test replicas that show the above sign of failure, and we'll likely also bump up max-conns with http...
[13:09:08] <Lucas_WMDE>	 I think we can probably deploy the changes for hueitan and hujihuji together
[13:09:11] <Lucas_WMDE>	 they both seem harmless enough
[13:09:19] <hueitan>	 no problem
[13:09:41] <hujihuji>	 Let's deploy the hu* patches then
[13:09:46] <Lucas_WMDE>	 hehe
[13:11:06] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations: Alert in need of triage: PuppetPendingCertificateRequest (instance puppetmaster1001:9100) - https://phabricator.wikimedia.org/T403388#11136355 (10elukey) 05Open→03Resolved a:03elukey ` elukey@puppetmaster1001:~$ sudo puppet cert destroy sretest2005.co...
[13:11:11] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82313 and previous config saved to /var/cache/conftool/dbconfig/20250901-131110-fceratto.json
[13:11:13] <xSavitar>	 Lucas_WMDE, thank you very much for deploying my patch. 🙏🏽
[13:11:19] <Lucas_WMDE>	 np
[13:12:56] <wikibugs>	 10SRE-swift-storage, 06Commons: HTTP 404 / File not found errors for three images in one category - https://phabricator.wikimedia.org/T403314#11136361 (10TheDJ) Missing originals: Strange, these are 2004 files, Considering there were thumbnails of these before, the originals must have been present at some time...
[13:13:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183216|SUL3: Use `metawiki` as central wiki (T402527)]] (duration: 09m 36s)
[13:13:30] <stashbot>	 T402527: Stop using loginwiki during SUL3 central login - https://phabricator.wikimedia.org/T402527
[13:13:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155805 (https://phabricator.wikimedia.org/T396347) (owner: 10Huji)
[13:14:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
[13:14:45] <wikibugs>	 (03Merged) 10jenkins-bot: Enable electionclerk user group on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155805 (https://phabricator.wikimedia.org/T396347) (owner: 10Huji)
[13:15:03] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1183610|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]], [[gerrit:1155805|Enable electionclerk user group on fawiki (T396347)]]
[13:15:08] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[13:15:08] <stashbot>	 T396347: Enable SecurePoll extension and electionclerk user group on fawiki - https://phabricator.wikimedia.org/T396347
[13:15:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[13:15:36] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183653 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[13:15:40] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183654 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[13:18:08] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P82314 and previous config saved to /var/cache/conftool/dbconfig/20250901-131807-ladsgroup.json
[13:18:26] <hujihuji>	 Lucas_WMDE: it appears to be working
[13:18:34] <hujihuji>	 Thanks for merging my change
[13:18:47] <Lucas_WMDE>	 that’s very early, it says 0% deplyoment progress even on the test servers :P
[13:18:54] <Lucas_WMDE>	 ok now it jumped to 75%
[13:19:07] <Lucas_WMDE>	 so I guess you got lucky and hit a server that already had the config change
[13:19:36] <jinxer-wm>	 FIRING: NetworkDeviceAlarmActive: Alarm active on ssw1-f1-eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#Juniper_alarm - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DNetworkDeviceAlarmActive
[13:20:33] <hujihuji>	 I wish I was always this lucky ;)
[13:20:41] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Nice, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1170543 (owner: 10Cathal Mooney)
[13:20:56] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 huji, hueitan, lucaswerkmeister-wmde: Backport for [[gerrit:1183610|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]], [[gerrit:1155805|Enable electionclerk user group on fawiki (T396347)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:21:00] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[13:21:01] <stashbot>	 T396347: Enable SecurePoll extension and electionclerk user group on fawiki - https://phabricator.wikimedia.org/T396347
[13:21:13] <Lucas_WMDE>	 aw, they didn’t even get to see the message
[13:21:24] <Lucas_WMDE>	 would’ve been useful to know for their next deployment
[13:21:28] <Lucas_WMDE>	 anyway, hueitan, please test :)
[13:22:17] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti3006.esams.wmnet with OS bookworm
[13:23:32] <kart_>	 hueitan: ^^
[13:24:27] <kart_>	 Lucas_WMDE: Things are fine. You can go ahead.
[13:24:31] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 huji, hueitan, lucaswerkmeister-wmde: Continuing with sync
[13:24:34] <Lucas_WMDE>	 ok, thanks
[13:25:07] <wikibugs>	 (03Merged) 10jenkins-bot: Add caller to maintenance script SQL queries [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183640 (https://phabricator.wikimedia.org/T313900) (owner: 10D3r1ck01)
[13:25:26] <wikibugs>	 (03PS1) 10Elukey: profile::amd_gpu: add a flag to deploy firmwares from Bookworm BPO [puppet] - 10https://gerrit.wikimedia.org/r/1183678 (https://phabricator.wikimedia.org/T393948)
[13:25:41] <hueitan>	 Lucas_WMDE my patch is fine, can go ahead
[13:26:18] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82315 and previous config saved to /var/cache/conftool/dbconfig/20250901-132617-fceratto.json
[13:26:28] <wikibugs>	 (03PS1) 10Slyngshede: P:cache::haproxy disallow Wikidata Query Service as UA [puppet] - 10https://gerrit.wikimedia.org/r/1183679
[13:26:31] <wikibugs>	 (03Merged) 10jenkins-bot: FixRenameUserLocalLogs: Batch more queries to speed up the script [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183653 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[13:26:32] <wikibugs>	 (03Merged) 10jenkins-bot: FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' [extensions/CentralAuth] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183654 (https://phabricator.wikimedia.org/T398177) (owner: 10Bartosz Dziewoński)
[13:26:48] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update Ganeti servers in esams to Bookworm - https://phabricator.wikimedia.org/T382509#11136405 (10MoritzMuehlenhoff) This update is piggybacked on https://phabricator.wikimedia.org/T402259
[13:27:02] <tappof>	 !log Add 15G to prometheus-k8s-dse lv
[13:27:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:17] <wikibugs>	 (03PS1) 10Elukey: Delete profile::python38 [puppet] - 10https://gerrit.wikimedia.org/r/1183680
[13:28:36] <wikibugs>	 (03CR) 10Muehlenhoff: profile::amd_gpu: add a flag to deploy firmwares from Bookworm BPO (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183678 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:29:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183680 (owner: 10Elukey)
[13:29:36] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[13:29:57] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183610|Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)]], [[gerrit:1155805|Enable electionclerk user group on fawiki (T396347)]] (duration: 14m 53s)
[13:30:01] <stashbot>	 T402496: Tracking code for Scenarios 1 for WE2.1.1 - https://phabricator.wikimedia.org/T402496
[13:30:01] <stashbot>	 T396347: Enable SecurePoll extension and electionclerk user group on fawiki - https://phabricator.wikimedia.org/T396347
[13:30:22] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Update Ganeti servers in esams to Bookworm - https://phabricator.wikimedia.org/T382509#11136426 (10MoritzMuehlenhoff)
[13:30:31] <Lucas_WMDE>	 MatmaRex: ^ fyi
[13:30:39] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1183640|Add caller to maintenance script SQL queries (T313900 T398177 T403387)]], [[gerrit:1183653|FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177)]], [[gerrit:1183654|FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177)]]
[13:30:45] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[13:30:45] <stashbot>	 T398177: 'renameuser' logs for a global rename use actor ID from metawiki instead of the local one when created by the fixStuckGlobalRename.php script - https://phabricator.wikimedia.org/T398177
[13:30:46] <stashbot>	 T403387: SQL query did not specify the caller (guessed caller: {caller}): {sql} - https://phabricator.wikimedia.org/T403387
[13:31:34] <huji_wmf>	 Lucas_WMDE: my device crashed. So much for being lucky ...
[13:31:43] <huji_wmf>	 Thanks again for your help! Anything else needed from me?
[13:32:11] <Lucas_WMDE>	 nope!
[13:32:11] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1 C:03+2] P:cache::haproxy Allow user-agents with contact information [puppet] - 10https://gerrit.wikimedia.org/r/1183618 (https://phabricator.wikimedia.org/T400119) (owner: 10Slyngshede)
[13:32:24] <Lucas_WMDE>	 huji_wmf: you just barely missed the message that would’ve told you that *now* the change was ready for testing :D
[13:32:33] <Lucas_WMDE>	 just so you know that’s a thing next time ^^
[13:32:47] <wikibugs>	 (03PS1) 10Elukey: Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948)
[13:33:10] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.clone_es of es2026.codfw.wmnet onto es2049.codfw.wmnet
[13:33:14] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.depool es2026 - Depool es2026.codfw.wmnet to then clone it to es2049.codfw.wmnet - fceratto@cumin1002
[13:33:15] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82317 and previous config saved to /var/cache/conftool/dbconfig/20250901-133314-ladsgroup.json
[13:33:20] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[13:33:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:33:30] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[13:33:33] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2026 - Depool es2026.codfw.wmnet to then clone it to es2049.codfw.wmnet - fceratto@cumin1002
[13:33:54] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6816/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:35:32] <MatmaRex>	 i'm back
[13:35:36] <wikibugs>	 (03PS1) 10Brouberol: mediawiki-dumps-legacy: only keep 3 dumps directories for each wiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183683 (https://phabricator.wikimedia.org/T403401)
[13:35:44] <MatmaRex>	 Lucas_WMDE: thanks
[13:35:52] <wikibugs>	 10SRE-swift-storage, 06Commons: HTTP 404 / File not found errors for three images in one category - https://phabricator.wikimedia.org/T403314#11136464 (10Pigsonthewing) >>! In T403314#11136361, @TheDJ wrote:  > Missing originals  I tried to use "Upload a new version of this file" for one of them, with the orig...
[13:36:11] <wikibugs>	 (03PS2) 10Elukey: profile::amd_gpu: add a flag to deploy firmwares from Bookworm BPO [puppet] - 10https://gerrit.wikimedia.org/r/1183678 (https://phabricator.wikimedia.org/T393948)
[13:36:11] <wikibugs>	 (03PS2) 10Elukey: Delete profile::python38 [puppet] - 10https://gerrit.wikimedia.org/r/1183680
[13:36:11] <wikibugs>	 (03PS2) 10Elukey: Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948)
[13:36:24] <wikibugs>	 (03CR) 10Elukey: profile::amd_gpu: add a flag to deploy firmwares from Bookworm BPO (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183678 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:36:33] <logmsgbot>	 fceratto@cumin1002 clone_es (PID 238179) is awaiting input
[13:36:41] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, matmarex, d3r1ck01: Backport for [[gerrit:1183640|Add caller to maintenance script SQL queries (T313900 T398177 T403387)]], [[gerrit:1183653|FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177)]], [[gerrit:1183654|FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177)]] synced to the testservers (see http
[13:36:41] <logmsgbot>	 s://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:36:45] <Lucas_WMDE>	 there should be nothing to test for these backports (they’re all only in maintenance/), I’ll just do a very quick sanity check that being logged in and logging in still works
[13:36:49] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[13:36:49] <stashbot>	 T398177: 'renameuser' logs for a global rename use actor ID from metawiki instead of the local one when created by the fixStuckGlobalRename.php script - https://phabricator.wikimedia.org/T398177
[13:36:50] <stashbot>	 T403387: SQL query did not specify the caller (guessed caller: {caller}): {sql} - https://phabricator.wikimedia.org/T403387
[13:37:22] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, matmarex, d3r1ck01: Continuing with sync
[13:40:33] <wikibugs>	 (03PS3) 10Elukey: Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948)
[13:41:05] <Lucas_WMDE>	 MatmaRex: same four batches as before for FixRenamedUserGlobalEditCount --fix, right?
[13:41:19] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6817/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:41:26] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82319 and previous config saved to /var/cache/conftool/dbconfig/20250901-134125-fceratto.json
[13:41:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183678 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:41:29] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[13:41:32] <MatmaRex>	 Lucas_WMDE: yep. thank you
[13:41:40] <huji_wmf>	 Lucas_WMDE: I have reconfirmed that things are working. I am going to close the Phab task. Have a great rest of the day
[13:41:41] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
[13:41:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82320 and previous config saved to /var/cache/conftool/dbconfig/20250901-134148-fceratto.json
[13:41:49] * huji_wmf says bye
[13:41:54] <Lucas_WMDE>	 bye huji_wmf!
[13:42:20] <wikibugs>	 (03PS4) 10Elukey: Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948)
[13:42:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1183640|Add caller to maintenance script SQL queries (T313900 T398177 T403387)]], [[gerrit:1183653|FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177)]], [[gerrit:1183654|FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177)]] (duration: 11m 48s)
[13:42:32] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[13:42:33] <stashbot>	 T398177: 'renameuser' logs for a global rename use actor ID from metawiki instead of the local one when created by the fixStuckGlobalRename.php script - https://phabricator.wikimedia.org/T398177
[13:42:33] <stashbot>	 T403387: SQL query did not specify the caller (guessed caller: {caller}): {sql} - https://phabricator.wikimedia.org/T403387
[13:42:47] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki  # T398177 (dry run)
[13:43:05] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6818/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:43:41] <jinxer-wm>	 FIRING: ConfdResourceFailed: confd resource _etc_haproxy_conf.d_tls.cfg.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[13:43:54] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[13:44:13] <wikibugs>	 (03PS5) 10Elukey: Add a new insetup role for ml-k8s hosts to test their GPU [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948)
[13:44:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
[13:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[13:44:41] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
[13:44:49] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[13:44:58] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6819/co" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:45:06] <wikibugs>	 (03CR) 10Btullis: [C:03+1] mediawiki-dumps-legacy: only keep 3 dumps directories for each wiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183683 (https://phabricator.wikimedia.org/T403401) (owner: 10Brouberol)
[13:45:07] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20220310000000 --until=20230101000000  # T313900
[13:45:22] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] mediawiki-dumps-legacy: only keep 3 dumps directories for each wiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/1183683 (https://phabricator.wikimedia.org/T403401) (owner: 10Brouberol)
[13:45:27] <Lucas_WMDE>	 MatmaRex: FixRenameUserLocalLogs already made it through aawiki so it looks like some speedup is happening
[13:45:45] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "This is a proposal for a new role that is halfway between insetup and ml-k8s, lemmek now!" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[13:46:13] <MatmaRex>	 nice
[13:47:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.postgresql.postgres-init
[13:49:00] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
[13:50:53] <wikibugs>	 (03PS1) 10Slyngshede: Revert "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183687
[13:50:55] <Lucas_WMDE>	 > Corrected edit count for 'AndrewGarfieldIsTheBestSpiderMan': from 887 to 756 (-131; 0.85x)
[13:51:05] <Lucas_WMDE>	 I’m sorry, we’ll have to revert the whole maintenance script run. this is clearly incorrect information /s
[13:51:09] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
[13:51:40] <wikibugs>	 10SRE-swift-storage, 06Commons: HTTP 404 / File not found errors for three images in one category - https://phabricator.wikimedia.org/T403314#11136523 (10Pigsonthewing) I now see that the Marischal College image is showing again. I will try the same steps with the other two.
[13:52:59] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
[13:53:09] <MatmaRex>	 :D
[13:56:01] * Lucas_WMDE in a meeting now
[13:56:06] <Lucas_WMDE>	 so I might not start batch 2 immediately
[13:56:12] <Lucas_WMDE>	 (but right now batch 1 isn’t done yet)
[13:56:27] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82321 and previous config saved to /var/cache/conftool/dbconfig/20250901-135626-fceratto.json
[13:56:30] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[13:57:29] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Revert "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183687 (owner: 10Slyngshede)
[13:59:05] <wikibugs>	 (03PS1) 10Fabfur: team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668)
[13:59:20] <wikibugs>	 (03PS27) 10Arnaudb: gerrit: mod qos configuration [puppet] - 10https://gerrit.wikimedia.org/r/1183117 (https://phabricator.wikimedia.org/T402611)
[13:59:20] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] "+2 to test on gerrit2003" [puppet] - 10https://gerrit.wikimedia.org/r/1183117 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb)
[13:59:42] <wikibugs>	 (03PS1) 10Arnaudb: Revert "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1183690
[14:00:24] <wikibugs>	 (03PS1) 10Stevemunene: dse-k8s: disable cluster_dns to allow core-dns deploy. [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298)
[14:00:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[14:02:06] <Lucas_WMDE>	 “Done, corrected 8757 edit counts”
[14:02:14] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20230101000000 --until=20240101000000  # T313900
[14:02:17] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[14:03:45] <wikibugs>	 (03PS1) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496)
[14:03:49] <wikibugs>	 (03PS1) 10Vgutierrez: Revert^2 "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183693
[14:04:57] <wikibugs>	 (03PS2) 10Vgutierrez: Revert^2 "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119)
[14:05:06] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:07:37] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, September 02 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[14:08:00] <wikibugs>	 (03CR) 10Slyngshede: Revert^2 "P:cache::haproxy Allow user-agents with contact information" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:08:35] <wikibugs>	 (03CR) 10Fabfur: Revert^2 "P:cache::haproxy Allow user-agents with contact information" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:08:48] <wikibugs>	 (03PS3) 10Vgutierrez: Revert^2 "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119)
[14:09:02] <wikibugs>	 (03CR) 10Vgutierrez: Revert^2 "P:cache::haproxy Allow user-agents with contact information" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:10:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[14:10:54] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3006.esams.wmnet with OS bookworm
[14:11:34] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82322 and previous config saved to /var/cache/conftool/dbconfig/20250901-141133-fceratto.json
[14:12:04] <Dreamy_Jazz>	 jouncebot: nowandnext
[14:12:04] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 17 minute(s)
[14:12:04] <jouncebot>	 In 0 hour(s) and 17 minute(s): xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1430)
[14:12:30] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] "tests done" [puppet] - 10https://gerrit.wikimedia.org/r/1183690 (owner: 10Arnaudb)
[14:12:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[14:13:35] <wikibugs>	 (03PS1) 10Arnaudb: Revert^2 "gerrit: mod qos configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1183698
[14:13:38] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:13:55] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] Revert^2 "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:15:09] <wikibugs>	 (03PS2) 10Fabfur: team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668)
[14:17:01] <wikibugs>	 (03CR) 10CI reject: [V:04-1] team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[14:17:16] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:17:31] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Revert^2 "P:cache::haproxy Allow user-agents with contact information" [puppet] - 10https://gerrit.wikimedia.org/r/1183693 (https://phabricator.wikimedia.org/T400119) (owner: 10Vgutierrez)
[14:17:56] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:17:58] <wikibugs>	 (03CR) 10Huei Tan: "recheck" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[14:18:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-esams and 185.15.59.144 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:20:22] <Lucas_WMDE>	 “Done, corrected 10639 edit counts”
[14:20:49] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20240101000000 --until=20250101000000  # T313900
[14:20:52] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[14:22:59] <wikibugs>	 (03PS3) 10Fabfur: team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668)
[14:23:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and 185.15.59.145 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:24:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[14:25:13] <wikibugs>	 (03CR) 10Huei Tan: "recheck" [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[14:25:48] <jinxer-wm>	 RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[14:26:42] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82323 and previous config saved to /var/cache/conftool/dbconfig/20250901-142641-fceratto.json
[14:28:40] <wikibugs>	 (03CR) 10Brouberol: "Could you hadd a `Hosts` header, so we could see the PCC diff?" [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298) (owner: 10Stevemunene)
[14:29:39] <jinxer-wm>	 FIRING: [2x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-esams (185.15.59.145) - group Confed_esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[14:30:04] <jouncebot>	 Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1430)
[14:32:50] <logmsgbot>	 !log dreamyjazz Deployed security patch for T403289
[14:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[14:36:56] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:37:16] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:38:06] <Lucas_WMDE>	 “Done, corrected 13028 edit counts”
[14:38:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr1-eqiad and 185.15.59.145 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:38:27] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20250101000000  # T313900
[14:38:30] <stashbot>	 T313900: Renaming a user doubles their edit count according to CentralAuthUser::getGlobalEditCount() / global_edit_count.gec_count field - https://phabricator.wikimedia.org/T313900
[14:39:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr1-eqiad and cr2-esams (185.15.59.145) - group Confed_esams - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[14:40:38] <wikibugs>	 (03PS2) 10Stevemunene: dse-k8s: disable cluster_dns to allow core-dns deploy. [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298)
[14:40:45] <wikibugs>	 (03PS4) 10Fabfur: team-traffic: raise haproxykafka alert thresholds [alerts] - 10https://gerrit.wikimedia.org/r/1183689 (https://phabricator.wikimedia.org/T370668)
[14:40:45] <wikibugs>	 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ssw1-f1-eqiad: Fan Spinning Upgraded - https://phabricator.wikimedia.org/T400783#11136677 (10ayounsi) p:05Triage→03Low
[14:41:02] <wikibugs>	 (03PS1) 10Krinkle: Disable wmgUseMdotRouting on testwiki in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183700 (https://phabricator.wikimedia.org/T401595)
[14:41:49] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82324 and previous config saved to /var/cache/conftool/dbconfig/20250901-144148-fceratto.json
[14:41:52] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[14:42:04] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2227.codfw.wmnet with reason: Maintenance
[14:42:12] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82325 and previous config saved to /var/cache/conftool/dbconfig/20250901-144211-fceratto.json
[14:42:54] <wikibugs>	 (03CR) 10Stevemunene: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298) (owner: 10Stevemunene)
[14:43:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr1-eqiad and 185.15.59.145 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status  - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[14:43:41] <jinxer-wm>	 RESOLVED: ConfdResourceFailed: confd resource _etc_haproxy_conf.d_tls.cfg.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[14:50:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Automate PDU Deployment Process - https://phabricator.wikimedia.org/T403173#11136703 (10LSobanski) p:05Triage→03Medium @Jclark-ctr please let I/F know when a new PDU arrives so that the traffic can be analyzed.
[14:50:59] <Lucas_WMDE>	 “Done, corrected 10497 edit counts”
[14:55:49] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82326 and previous config saved to /var/cache/conftool/dbconfig/20250901-145548-fceratto.json
[14:55:52] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[14:56:46] <wikibugs>	 10SRE-tools, 10homer, 06Infrastructure-Foundations: Homer: add parallelization support - https://phabricator.wikimedia.org/T250415#11136718 (10ayounsi) p:05High→03Medium Lowering the priority as this has been working fine.
[15:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[15:05:05] <wikibugs>	 (03PS3) 10Stevemunene: dse-k8s: disable cluster_dns to allow core-dns deploy. [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298)
[15:07:30] <wikibugs>	 (03PS1) 10Nik Gkountas: ContentTranslation: Add cxserver host for server-side requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183703 (https://phabricator.wikimedia.org/T386131)
[15:08:40] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:09:09] <wikibugs>	 (03CR) 10Stevemunene: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183691 (https://phabricator.wikimedia.org/T397298) (owner: 10Stevemunene)
[15:10:56] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82327 and previous config saved to /var/cache/conftool/dbconfig/20250901-151056-fceratto.json
[15:17:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Assign ganeti_routed role to ganeti3006 and configure cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183704
[15:17:50] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
[15:17:58] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82332 and previous config saved to /var/cache/conftool/dbconfig/20250901-151757-ladsgroup.json
[15:18:01] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[15:19:23] <moritzm>	 !log installing luajit security updates
[15:19:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:40] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11136788 (10DavidBrooks) Re the comment: "Allow user-agents with contact information" - implies blocking UAs with no contact information. Is this referring...
[15:25:27] <wikibugs>	 06SRE, 06Traffic, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11136791 (10Vgutierrez) >>! In T400119#11136788, @DavidBrooks wrote: > Re the comment: "Allow user-agents with contact information" - implies blocking UAs...
[15:26:01] <wikibugs>	 (03PS2) 10Ayounsi: Assign ganeti_routed role to ganeti3006 and configure cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183704 (owner: 10Muehlenhoff)
[15:26:04] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82333 and previous config saved to /var/cache/conftool/dbconfig/20250901-152603-fceratto.json
[15:28:15] <wikibugs>	 (03PS3) 10Ayounsi: Assign ganeti_routed role to ganeti3006 and configure cluster in esams [puppet] - 10https://gerrit.wikimedia.org/r/1183704 (owner: 10Muehlenhoff)
[15:28:23] <wikibugs>	 (03CR) 10Ayounsi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1183704 (owner: 10Muehlenhoff)
[15:28:40] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:29:36] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:30:05] <jouncebot>	 jan_drewniak: Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1530). Please do the needful.
[15:37:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1183681 (https://phabricator.wikimedia.org/T393948) (owner: 10Elukey)
[15:40:29] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82334 and previous config saved to /var/cache/conftool/dbconfig/20250901-154028-ladsgroup.json
[15:40:32] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[15:41:12] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82335 and previous config saved to /var/cache/conftool/dbconfig/20250901-154111-fceratto.json
[15:41:14] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[15:41:27] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
[15:48:48] <wikibugs>	 06SRE, 06Wikimedia Enterprise: Provide auth-less access to Enterprise APIs from WMF Analytics cluster - https://phabricator.wikimedia.org/T403298#11136889 (10Urbanecm) >>! In T403298#11135615, @JMeybohm wrote: > Please keep in mind that allowing the HTTP proxy IPs will ultimately allow Enterprise API access fr...
[15:55:36] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P82336 and previous config saved to /var/cache/conftool/dbconfig/20250901-155535-ladsgroup.json
[16:10:45] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P82337 and previous config saved to /var/cache/conftool/dbconfig/20250901-161043-ladsgroup.json
[16:18:38] <wikibugs>	 (03CR) 10Abijeet Patro: Setup tracking for CentralNotice banners experiment for WE2.1.1 (031 comment) [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[16:20:01] <wikibugs>	 (03PS1) 10Filippo Giunchedi: java: add support for Trixie / Java 21 [puppet] - 10https://gerrit.wikimedia.org/r/1183707 (https://phabricator.wikimedia.org/T403154)
[16:23:06] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2205.codfw.wmnet with reason: Maintenance
[16:25:53] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82338 and previous config saved to /var/cache/conftool/dbconfig/20250901-162552-ladsgroup.json
[16:25:56] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[16:26:08] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[16:28:54] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11137032 (10phaultfinder)
[16:33:57] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11137048 (10phaultfinder)
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1700)
[17:00:05] <jouncebot>	 ryankemper: Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T1700). Please do the needful.
[17:01:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:17:55] <wikibugs>	 (03CR) 10KartikMistry: [C:03+1] ContentTranslation: Add cxserver host for server-side requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183703 (https://phabricator.wikimedia.org/T386131) (owner: 10Nik Gkountas)
[17:18:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.206s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:29:36] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[17:43:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.3s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[17:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[17:59:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
[17:59:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
[18:09:51] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
[18:09:58] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82339 and previous config saved to /var/cache/conftool/dbconfig/20250901-180958-ladsgroup.json
[18:10:01] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[18:11:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:21:49] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on wdqs2025:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[18:24:53] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, September 01 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1175222 (https://phabricator.wikimedia.org/T400428) (owner: 10NMW03)
[18:30:44] <wikibugs>	 (03PS3) 10NMW03: Add rights to bypass spam blacklists for azwiki sysops and interface-admins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1175222 (https://phabricator.wikimedia.org/T400428)
[18:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[18:34:27] <wikibugs>	 10SRE-swift-storage, 06Commons: HTTP 404 / File not found errors for three images in one category - https://phabricator.wikimedia.org/T403314#11137316 (10MatthewVernon) 05Open→03Resolved a:03Pigsonthewing Thanks!
[18:43:51] <wikibugs>	 (03PS2) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496)
[18:43:57] <wikibugs>	 (03CR) 10Huei Tan: Setup tracking for CentralNotice banners experiment for WE2.1.1 (031 comment) [extensions/WikimediaCampaignEvents] (wmf/1.45.0-wmf.16) - 10https://gerrit.wikimedia.org/r/1183692 (https://phabricator.wikimedia.org/T402496) (owner: 10Huei Tan)
[19:04:23] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137337 (10VRiley-WMF) Starting on ms-be1083
[19:04:34] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137338 (10VRiley-WMF) 05Open→03In progress
[19:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[19:10:00] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82340 and previous config saved to /var/cache/conftool/dbconfig/20250901-190959-ladsgroup.json
[19:10:03] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[19:14:03] <wikibugs>	 (03CR) 10Urbanecm: [C:03+1] "functionally, LGTM. let's wait for the sync on Tuesday to double check we want to go ahead." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1179648 (https://phabricator.wikimedia.org/T395524) (owner: 10Cyndywikime)
[19:23:44] <XioNoX>	 !log cr1-esams> request chassis fpc slot 1 offline - T403360
[19:23:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:51] <stashbot>	 T403360: FPC1 Failure on cr1-esams - take 2 - https://phabricator.wikimedia.org/T403360
[19:25:07] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2186', diff saved to https://phabricator.wikimedia.org/P82341 and previous config saved to /var/cache/conftool/dbconfig/20250901-192507-ladsgroup.json
[19:40:15] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2186', diff saved to https://phabricator.wikimedia.org/P82342 and previous config saved to /var/cache/conftool/dbconfig/20250901-194014-ladsgroup.json
[19:55:23] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82345 and previous config saved to /var/cache/conftool/dbconfig/20250901-195522-ladsgroup.json
[19:55:26] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[19:55:38] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
[19:55:45] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82346 and previous config saved to /var/cache/conftool/dbconfig/20250901-195545-ladsgroup.json
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: That opportune time for a UTC late backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T2000).
[20:00:05] <jouncebot>	 Nemoralis: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:19] <Nemoralis>	 o/
[20:12:29] <Nemoralis>	 anyone?
[20:19:44] <perryprog>	 how many deployers does it take to change a config... (more than five apparently)
[20:24:33] <wikibugs>	 (03PS1) 10Hokwelum: Set $wgPHPSessionHandling to 'disable' on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183741 (https://phabricator.wikimedia.org/T362324)
[20:31:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
[20:32:27] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
[20:33:59] <wikibugs>	 10ops-magru: Alert for device ps1-b3-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403273#11137429 (10phaultfinder)
[20:34:29] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137431 (10VRiley-WMF)
[20:37:16] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137433 (10VRiley-WMF) ms-be1083 has been completed. moving onto ms-be1084
[20:38:59] <wikibugs>	 10ops-magru: Alert for device ps1-b4-magru.mgmt.magru.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403275#11137435 (10phaultfinder)
[20:39:01] <wikibugs>	 (03CR) 10D3r1ck01: [C:03+1] Set $wgPHPSessionHandling to 'disable' on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183741 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum)
[20:46:35] <wikibugs>	 06SRE, 10DNS, 06Traffic, 10WikiLearn: DNS records for WikiLearn - https://phabricator.wikimedia.org/T365435#11137439 (10Ijon) 05Open→03Declined Thanks for the ping.  We are indeed resolving it by using an address in learn.wiki. This ticket can be closed.
[20:55:12] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82347 and previous config saved to /var/cache/conftool/dbconfig/20250901-205511-ladsgroup.json
[20:55:15] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[21:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T2100).
[21:01:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:05:19] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C:03+1] Set $wgPHPSessionHandling to 'disable' on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183741 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum)
[21:07:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, September 02 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1183741 (https://phabricator.wikimedia.org/T362324) (owner: 10Hokwelum)
[21:10:20] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P82348 and previous config saved to /var/cache/conftool/dbconfig/20250901-211019-ladsgroup.json
[21:14:59] <wikibugs>	 10ops-eqiad, 06DC-Ops: Alert for device ps1-a4-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T403431 (10phaultfinder) 03NEW
[21:25:27] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P82350 and previous config saved to /var/cache/conftool/dbconfig/20250901-212526-ladsgroup.json
[21:29:36] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[21:40:35] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82351 and previous config saved to /var/cache/conftool/dbconfig/20250901-214034-ladsgroup.json
[21:40:38] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[21:40:50] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2196.codfw.wmnet with reason: Maintenance
[21:40:58] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82352 and previous config saved to /var/cache/conftool/dbconfig/20250901-214057-ladsgroup.json
[21:44:36] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[21:56:52] <icinga-wm>	 PROBLEM - mysqld processes on es2026 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[21:57:14] <icinga-wm>	 PROBLEM - MariaDB read only es2 on es2026 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[21:58:14] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137490 (10VRiley-WMF)
[21:58:36] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137491 (10VRiley-WMF) ms-be1084 completed. Moving onto ms-be1085
[22:32:54] <jinxer-wm>	 FIRING: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[22:38:08] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82353 and previous config saved to /var/cache/conftool/dbconfig/20250901-223807-ladsgroup.json
[22:38:11] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[22:41:01] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137536 (10VRiley-WMF) 05In progress→03Open
[22:41:14] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 13Patch-For-Review: Install new disk controllers to SM swift backends (eqiad) - https://phabricator.wikimedia.org/T400877#11137539 (10VRiley-WMF) ms-be1085 is completed
[22:53:15] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2196', diff saved to https://phabricator.wikimedia.org/P82354 and previous config saved to /var/cache/conftool/dbconfig/20250901-225314-ladsgroup.json
[23:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250901T2300)
[23:04:36] <jinxer-wm>	 FIRING: OsmSynchronisationLag: Maps - OSM synchronization lag - codfw - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[23:08:23] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2196', diff saved to https://phabricator.wikimedia.org/P82355 and previous config saved to /var/cache/conftool/dbconfig/20250901-230822-ladsgroup.json
[23:23:31] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82356 and previous config saved to /var/cache/conftool/dbconfig/20250901-232330-ladsgroup.json
[23:23:34] <stashbot>	 T403362: Change row format of cx_corpora - https://phabricator.wikimedia.org/T403362
[23:23:46] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
[23:38:53] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1183749
[23:38:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1183749 (owner: 10TrainBranchBot)
[23:52:50] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1183749 (owner: 10TrainBranchBot)