[00:02:19] <wikibugs>	 (03PS1) 10BryanDavis: wikitech: Update Gerrit blocking logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995120 (https://phabricator.wikimedia.org/T307558)
[00:02:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikitech: Update Gerrit blocking logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995120 (https://phabricator.wikimedia.org/T307558) (owner: 10BryanDavis)
[00:03:59] <bd808>	 oh code sniff. :P
[00:05:04] <wikibugs>	 (03PS2) 10BryanDavis: wikitech: Update Gerrit blocking logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995120 (https://phabricator.wikimedia.org/T307558)
[00:13:03] * thcipriani stops investigating errors now and backports
[00:13:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by thcipriani@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995120 (https://phabricator.wikimedia.org/T307558) (owner: 10BryanDavis)
[00:14:09] <wikibugs>	 (03Merged) 10jenkins-bot: wikitech: Update Gerrit blocking logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995120 (https://phabricator.wikimedia.org/T307558) (owner: 10BryanDavis)
[00:14:23] <logmsgbot>	 !log thcipriani@deploy2002 Started scap: Backport for [[gerrit:995120|wikitech: Update Gerrit blocking logic (T307558)]]
[00:15:48] <logmsgbot>	 !log thcipriani@deploy2002 thcipriani and bd808: Backport for [[gerrit:995120|wikitech: Update Gerrit blocking logic (T307558)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[00:16:49] <logmsgbot>	 !log thcipriani@deploy2002 thcipriani and bd808: Continuing with sync
[00:23:30] <logmsgbot>	 !log thcipriani@deploy2002 Finished scap: Backport for [[gerrit:995120|wikitech: Update Gerrit blocking logic (T307558)]] (duration: 09m 06s)
[00:23:58] <thcipriani>	 ^ bd808 all done
[00:25:21] <bd808>	 thcipriani: <3
[00:26:11] <thcipriani>	 thanks for the fix!
[00:27:30] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "I haven't verified how loki will behave, but patch LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/994786 (https://phabricator.wikimedia.org/T352665) (owner: 10Andrea Denisse)
[00:28:54] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] grafana: Enable stunnel for Loki data transfer [puppet] - 10https://gerrit.wikimedia.org/r/994999 (https://phabricator.wikimedia.org/T352665) (owner: 10Andrea Denisse)
[00:38:44] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995018
[00:38:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995018 (owner: 10TrainBranchBot)
[00:43:02] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:00:38] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/995018 (owner: 10TrainBranchBot)
[02:39:29] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:10:38] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:44:28] <icinga-wm>	 PROBLEM - CirrusSearch comp_suggest codfw 95th percentile latency on graphite1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [250.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=50
[04:50:34] <icinga-wm>	 RECOVERY - CirrusSearch comp_suggest codfw 95th percentile latency on graphite1005 is OK: OK: Less than 20.00% above the threshold [100.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=50
[05:01:24] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:04:14] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 137, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:04:22] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 225, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:08:54] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 138, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:08:56] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:32:46] <jinxer-wm>	 (Traffic bill over quota) firing: Alert for device cr2-eqsin.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[05:52:46] <jinxer-wm>	 (Traffic bill over quota) resolved: Alert for device cr2-eqsin.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[05:54:25] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:04:44] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
[06:04:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
[06:05:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56090 and previous config saved to /var/cache/conftool/dbconfig/20240202-060504-marostegui.json
[06:05:09] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[06:06:18] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Remove db1106 [puppet] - 10https://gerrit.wikimedia.org/r/995139 (https://phabricator.wikimedia.org/T327616)
[06:06:36] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.decommission for hosts db1106.eqiad.wmnet
[06:11:16] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Remove db1106 [puppet] - 10https://gerrit.wikimedia.org/r/995139 (https://phabricator.wikimedia.org/T327616) (owner: 10Marostegui)
[06:11:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.dns.netbox
[06:13:50] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1106.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[06:14:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1106.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
[06:14:54] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[06:14:55] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1106.eqiad.wmnet
[06:15:16] <jinxer-wm>	 (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner at codfw: 3.241% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[06:16:24] <wikibugs>	 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1106.eqiad.wmnet - https://phabricator.wikimedia.org/T327616 (10Marostegui) This is ready for DC-Ops
[06:16:57] <wikibugs>	 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1106.eqiad.wmnet - https://phabricator.wikimedia.org/T327616 (10Marostegui) a:05Marostegui→03None
[06:20:15] <jinxer-wm>	 (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-jobrunner at codfw: 7.787% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-jobrunner&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy
[06:38:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Change db1163 weight', diff saved to https://phabricator.wikimedia.org/P56092 and previous config saved to /var/cache/conftool/dbconfig/20240202-063844-marostegui.json
[06:38:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Change db1163 weight', diff saved to https://phabricator.wikimedia.org/P56093 and previous config saved to /var/cache/conftool/dbconfig/20240202-063858-marostegui.json
[06:45:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56094 and previous config saved to /var/cache/conftool/dbconfig/20240202-064502-marostegui.json
[06:53:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
[06:58:05] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240202T0700)
[07:00:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56095 and previous config saved to /var/cache/conftool/dbconfig/20240202-070009-marostegui.json
[07:12:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
[07:15:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56096 and previous config saved to /var/cache/conftool/dbconfig/20240202-071516-marostegui.json
[07:15:19] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
[07:15:20] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[07:15:33] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
[07:15:35] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[07:15:49] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[07:15:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56097 and previous config saved to /var/cache/conftool/dbconfig/20240202-071555-marostegui.json
[07:18:16] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
[07:52:37] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "You should also add the variable to the ENV declaration in the dockerfile.template" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/994764 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[07:57:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "The change will work as-is, but I suggest an improvement. Feel free to adopt that in a second patch though." [deployment-charts] - 10https://gerrit.wikimedia.org/r/994789 (https://phabricator.wikimedia.org/T346690) (owner: 10Effie Mouzeli)
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240202T0800)
[08:02:03] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "There are several things that need to be improved here:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/991369 (owner: 10Alexandros Kosiaris)
[08:05:30] <wikibugs>	 (03PS1) 10KartikMistry: Update MinT to 2024-01-30-080508-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/995170 (https://phabricator.wikimedia.org/T354666)
[08:13:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Readd Arturo's key [puppet] - 10https://gerrit.wikimedia.org/r/995171 (https://phabricator.wikimedia.org/T356403)
[08:16:10] <wikibugs>	 (03PS3) 10Effie Mouzeli: php: add env[MCROUTER_SERVER] variable [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/994764 (https://phabricator.wikimedia.org/T346690)
[08:16:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56098 and previous config saved to /var/cache/conftool/dbconfig/20240202-081626-marostegui.json
[08:16:42] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[08:16:52] <wikibugs>	 (03PS2) 10MusikAnimal: CommonSettings: enable UrlShortenerEnableQrCode everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/967900 (https://phabricator.wikimedia.org/T348487) (owner: 10Samtar)
[08:20:24] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] CommonSettings: enable UrlShortenerEnableQrCode everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/967900 (https://phabricator.wikimedia.org/T348487) (owner: 10Samtar)
[08:20:43] <wikibugs>	 (03PS2) 10Effie Mouzeli: mw-debug: set MCROUTER_SERVER variable [deployment-charts] - 10https://gerrit.wikimedia.org/r/994789 (https://phabricator.wikimedia.org/T346690)
[08:21:05] <wikibugs>	 (03Merged) 10jenkins-bot: CommonSettings: enable UrlShortenerEnableQrCode everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/967900 (https://phabricator.wikimedia.org/T348487) (owner: 10Samtar)
[08:29:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Readd Arturo to Icinga authorised commands [puppet] - 10https://gerrit.wikimedia.org/r/995173 (https://phabricator.wikimedia.org/T356403)
[08:30:19] <logmsgbot>	 !log tstarling@deploy2002 Synchronized wmf-config/CommonSettings.php: Enable UrlShortener QR code everywhere (T348487) (duration: 07m 23s)
[08:30:30] <stashbot>	 T348487: Share QR codes (deployment tracking) - https://phabricator.wikimedia.org/T348487
[08:31:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56099 and previous config saved to /var/cache/conftool/dbconfig/20240202-083133-marostegui.json
[08:33:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Readd Arturo to Icinga authorised commands [puppet] - 10https://gerrit.wikimedia.org/r/995173 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[08:43:18] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] C:samplicator Icinga monitoring is not required. [puppet] - 10https://gerrit.wikimedia.org/r/994698 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[08:43:25] <wikibugs>	 (03PS1) 10Brouberol: Add superset/superset-next.svc.eqiad.wmnet records [dns] - 10https://gerrit.wikimedia.org/r/995174 (https://phabricator.wikimedia.org/T356481)
[08:44:23] <slyngs>	 @moritzm Ok to merge your pathc?
[08:45:12] <moritzm>	 sorry, got distracted. yes, please
[08:46:11] <slyngs>	 Done
[08:46:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: logging::collector: add mw accesslog sampling by benthos (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993476 (https://phabricator.wikimedia.org/T355836) (owner: 10Cwhite)
[08:46:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56100 and previous config saved to /var/cache/conftool/dbconfig/20240202-084640-marostegui.json
[08:50:18] <wikibugs>	 (03PS1) 10KartikMistry: WIP: Enable Section Translation on newly created Wikipedias by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995176 (https://phabricator.wikimedia.org/T298235)
[08:57:03] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/995040 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[08:57:10] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe1011.eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
[08:57:43] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe1011.eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
[08:58:32] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] dse-k8s: remove rdf-streaming-updater service [deployment-charts] - 10https://gerrit.wikimedia.org/r/966902 (https://phabricator.wikimedia.org/T349095) (owner: 10Bking)
[09:01:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56101 and previous config saved to /var/cache/conftool/dbconfig/20240202-090146-marostegui.json
[09:01:49] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
[09:01:55] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[09:02:03] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
[09:02:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56102 and previous config saved to /var/cache/conftool/dbconfig/20240202-090209-marostegui.json
[09:03:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] debmonitor: Remove support for old deployment method [puppet] - 10https://gerrit.wikimedia.org/r/995040 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[09:09:36] <wikibugs>	 (03PS2) 10KartikMistry: WIP: Enable Section Translation on newly created Wikipedias by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995176 (https://phabricator.wikimedia.org/T298235)
[09:09:57] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
[09:11:35] <wikibugs>	 (03PS1) 10Slyngshede: P:hue absent Icinga monitoring. [puppet] - 10https://gerrit.wikimedia.org/r/995180 (https://phabricator.wikimedia.org/T350694)
[09:17:59] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:18:02] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
[09:18:57] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:19:33] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:20:05] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51451 bytes in 0.098 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:20:27] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.267 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:20:45] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:26:23] <wikibugs>	 (03PS1) 10Slyngshede: P:kerberos::kadminserver absent Icinga check [puppet] - 10https://gerrit.wikimedia.org/r/995181 (https://phabricator.wikimedia.org/T350694)
[09:27:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56103 and previous config saved to /var/cache/conftool/dbconfig/20240202-092752-marostegui.json
[09:27:57] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[09:30:29] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Last 30 days of data in grafana points out semi-frequent restarts due to memory exhaustion. kubectl says 289 restarts in 49d. Taking out w" [deployment-charts] - 10https://gerrit.wikimedia.org/r/995063 (owner: 10Clément Goubert)
[09:33:29] <wikibugs>	 (03Merged) 10jenkins-bot: calico: Bump wikikube kube-controllers memory [deployment-charts] - 10https://gerrit.wikimedia.org/r/995063 (owner: 10Clément Goubert)
[09:35:29] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[09:35:49] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[09:36:41] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[09:36:54] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[09:39:56] <wikibugs>	 (03PS1) 10Muehlenhoff: debmonitor: Remove legacy cert handling [puppet] - 10https://gerrit.wikimedia.org/r/995183
[09:40:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: "+1 on the idea" [puppet] - 10https://gerrit.wikimedia.org/r/995181 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[09:40:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: "+1 on the idea" [puppet] - 10https://gerrit.wikimedia.org/r/995180 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[09:41:15] <wikibugs>	 (03CR) 10Slyngshede: "This is not allowed to be merged until we adjust the severity of SystemdUnitFailed" [puppet] - 10https://gerrit.wikimedia.org/r/995181 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[09:42:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56104 and previous config saved to /var/cache/conftool/dbconfig/20240202-094258-marostegui.json
[09:45:57] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-airflow1004.eqiad.wmnet with OS bullseye
[09:48:21] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Unless I'm missing some context I think there is a simpler solution, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/995108 (https://phabricator.wikimedia.org/T356054) (owner: 10Scott French)
[09:53:38] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Update default airflow_version and remove overrides [puppet] - 10https://gerrit.wikimedia.org/r/977638 (https://phabricator.wikimedia.org/T351621) (owner: 10Btullis)
[09:55:37] <wikibugs>	 (03CR) 10Hashar: "> I don't know too much about it other than it renames the repo :/" [software/gerrit] (wmf/stable-3.7) - 10https://gerrit.wikimedia.org/r/995035 (https://phabricator.wikimedia.org/T201953) (owner: 10Hashar)
[09:56:44] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1004.eqiad.wmnet with reason: host reimage
[09:58:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56106 and previous config saved to /var/cache/conftool/dbconfig/20240202-095805-marostegui.json
[10:00:07] <wikibugs>	 (03CR) 10Volans: "No, the problem is that I have etcd 3.5.11 and by default the v2 API are disabled, hence I quickly tried to force --proxy 'on' to enable t" [software/conftool] - 10https://gerrit.wikimedia.org/r/995053 (https://phabricator.wikimedia.org/T356423) (owner: 10Volans)
[10:01:28] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1004.eqiad.wmnet with reason: host reimage
[10:13:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56107 and previous config saved to /var/cache/conftool/dbconfig/20240202-101311-marostegui.json
[10:13:42] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[10:14:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Readd Arturo's key [puppet] - 10https://gerrit.wikimedia.org/r/995171 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[10:16:23] <wikibugs>	 (03PS1) 10Btullis: Upgrade the platform_eng instance of airflow to puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/995187 (https://phabricator.wikimedia.org/T347710)
[10:20:44] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1004.eqiad.wmnet with OS bullseye
[10:22:19] <wikibugs>	 (03CR) 10Muehlenhoff: "Key is the same as three months ago and was also verified out-of-band" [puppet] - 10https://gerrit.wikimedia.org/r/995171 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[10:22:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Readd Arturo's key [puppet] - 10https://gerrit.wikimedia.org/r/995171 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[10:26:41] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-aborrero: ops: add access for aborrero - https://phabricator.wikimedia.org/T356403 (10MoritzMuehlenhoff) - SSH access with the ops and analytics-privatedate-access groups has been restored - LDAP group memberships for cn=ops and cn=wmf have been rest...
[10:27:28] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host an-airflow1002.eqiad.wmnet with OS bullseye
[10:28:25] <Emperor>	 !log restart codfw swift-account (-b 1 -s 3)
[10:28:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:29] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
[10:29:32] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
[10:29:34] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[10:29:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
[10:29:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56108 and previous config saved to /var/cache/conftool/dbconfig/20240202-102943-marostegui.json
[10:29:48] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[10:31:17] <wikibugs>	 (03CR) 10Clément Goubert: P:httpbb: migrate tests from cumin1001 to cumin1002 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/995108 (https://phabricator.wikimedia.org/T356054) (owner: 10Scott French)
[10:32:12] <wikibugs>	 (03PS1) 10Jcrespo: mediabackups: Add newly setup storage host backup1011 [puppet] - 10https://gerrit.wikimedia.org/r/995188 (https://phabricator.wikimedia.org/T334069)
[10:32:14] <wikibugs>	 (03PS1) 10Jcrespo: mediabackups: Add newly setup storage host backup2011 [puppet] - 10https://gerrit.wikimedia.org/r/995189 (https://phabricator.wikimedia.org/T334069)
[10:32:18] <Emperor>	 !log restart codfw swift-container (-b 1 -s 3)
[10:32:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:08] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review, 10User-aborrero: ops: add access for aborrero - https://phabricator.wikimedia.org/T356403 (10MoritzMuehlenhoff) 05Open→03Resolved Rest can be self-serviced on demand now, closing.
[10:33:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::platform_eng
[10:34:18] <wikibugs>	 (03PS2) 10Jcrespo: mediabackups: Add newly setup storage host backup1011 [puppet] - 10https://gerrit.wikimedia.org/r/995188 (https://phabricator.wikimedia.org/T334069)
[10:34:27] <wikibugs>	 (03PS2) 10Jcrespo: mediabackups: Add newly setup storage host backup2011 [puppet] - 10https://gerrit.wikimedia.org/r/995189 (https://phabricator.wikimedia.org/T334069)
[10:36:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Upgrade the platform_eng instance of airflow to puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/995187 (https://phabricator.wikimedia.org/T347710) (owner: 10Btullis)
[10:39:51] <wikibugs>	 (03PS1) 10Arnaudb: admin: add sbailey to deployment group and add key [puppet] - 10https://gerrit.wikimedia.org/r/995019 (https://phabricator.wikimedia.org/T355612)
[10:40:29] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment or deploy-service group for sbailey(WMF) - https://phabricator.wikimedia.org/T355612 (10ABran-WMF)
[10:41:38] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1002.eqiad.wmnet with reason: host reimage
[10:42:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::platform_eng
[10:44:04] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1002.eqiad.wmnet with reason: host reimage
[10:44:50] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
[10:46:10] <wikibugs>	 (03PS3) 10KartikMistry: WIP: Enable Section Translation on newly created Wikipedias by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995176 (https://phabricator.wikimedia.org/T298235)
[10:47:46] <wikibugs>	 (03PS1) 10Clément Goubert: kubernetes: make 3 appservers kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/995191 (https://phabricator.wikimedia.org/T351074)
[10:48:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
[10:48:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] kubernetes: make 3 appservers kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/995191 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[10:50:43] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[10:53:46] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 7 hosts with reason: due for decomm
[10:54:12] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 7 hosts with reason: due for decomm
[10:54:19] <wikibugs>	 10SRE-swift-storage, 10Patch-For-Review: Q3 ms backend refresh work - https://phabricator.wikimedia.org/T353149 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=9ae79aa8-c7ef-4913-87b7-b10f805a6884) set by mvernon@cumin2002 for 7 days, 0:00:00 on 7 host(s) and their services with reason: due...
[10:55:52] <Emperor>	 !log stop puppet and swift on ms-be2044-50 T353149
[10:55:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:15] <stashbot>	 T353149: Q3 ms backend refresh work - https://phabricator.wikimedia.org/T353149
[11:07:49] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1002.eqiad.wmnet with OS bullseye
[11:08:41] <wikibugs>	 (03PS2) 10Clément Goubert: kubernetes: make 3 appservers kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/995191 (https://phabricator.wikimedia.org/T351074)
[11:20:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::research
[11:21:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P56109 and previous config saved to /var/cache/conftool/dbconfig/20240202-112100-root.json
[11:22:54] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch airflow/research to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/995193 (https://phabricator.wikimedia.org/T349619)
[11:22:58] <wikibugs>	 10SRE-Access-Requests, 10Data-Platform-SRE (2024.01.22 - 2024.02.11), 10Patch-For-Review: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279 (10BTullis) a:05BTullis→03None It's worth noting that there is only one remaining active entry in the `c...
[11:23:01] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[11:23:03] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[11:23:22] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch airflow/research to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/995193 (https://phabricator.wikimedia.org/T349619)
[11:26:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch airflow/research to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/995193 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[11:31:14] <wikibugs>	 (03PS4) 10KartikMistry: WIP: Enable Section Translation on newly created Wikipedias by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995176 (https://phabricator.wikimedia.org/T298235)
[11:31:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::research
[11:31:42] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] kubernetes: make 3 appservers kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/995191 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[11:36:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P56110 and previous config saved to /var/cache/conftool/dbconfig/20240202-113605-root.json
[11:37:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
[11:39:47] <wikibugs>	 (03PS5) 10KartikMistry: WIP: Enable Section Translation on newly created Wikipedias by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/995176 (https://phabricator.wikimedia.org/T298235)
[11:41:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
[11:42:52] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:43:17] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:44:48] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[11:45:14] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+2] kubernetes: make 3 appservers kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/995191 (https://phabricator.wikimedia.org/T351074) (owner: 10Clément Goubert)
[11:51:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P56111 and previous config saved to /var/cache/conftool/dbconfig/20240202-115110-root.json
[11:54:10] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
[11:54:21] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.reimage for host mw1488.eqiad.wmnet with OS bullseye
[11:54:23] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.reimage for host mw1496.eqiad.wmnet with OS bullseye
[11:54:30] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.reimage for host mw1419.eqiad.wmnet with OS bullseye
[11:58:39] <icinga-wm>	 PROBLEM - BFD status on cloudsw1-b1-codfw.mgmt is CRITICAL: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[11:59:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. I'll go ahead and merge since the group changes are already live and daily_account_consistency_check flagged it." [puppet] - 10https://gerrit.wikimedia.org/r/995106 (https://phabricator.wikimedia.org/T355937) (owner: 10Dzahn)
[11:59:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] admin: add wmdecyn to ldap_only_admins (wmde, nda) [puppet] - 10https://gerrit.wikimedia.org/r/995106 (https://phabricator.wikimedia.org/T355937) (owner: 10Dzahn)
[12:00:51] <icinga-wm>	 RECOVERY - BFD status on cloudsw1-b1-codfw.mgmt is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:01:13] <icinga-wm>	 PROBLEM - Check systemd state on mw1424 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:02:47] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[12:02:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[12:05:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove expiry data/contact for sannita [puppet] - 10https://gerrit.wikimedia.org/r/995196
[12:06:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P56112 and previous config saved to /var/cache/conftool/dbconfig/20240202-120615-root.json
[12:06:25] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb2003-dev.codfw.wmnet
[12:07:01] <icinga-wm>	 PROBLEM - Check systemd state on cloudlb2003-dev is CRITICAL: CRITICAL - degraded: The following units failed: networking.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:07:04] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[12:07:32] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1488.eqiad.wmnet with reason: host reimage
[12:07:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove expiry data/contact for sannita [puppet] - 10https://gerrit.wikimedia.org/r/995196 (owner: 10Muehlenhoff)
[12:08:02] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: host reimage
[12:08:14] <logmsgbot>	 !log cgoubert@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
[12:10:29] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1488.eqiad.wmnet with reason: host reimage
[12:12:45] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1424 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[12:12:54] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: host reimage
[12:15:35] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
[12:15:37] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 39502876352 and 3922 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:15:37] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 38451810912 and 3922 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:16:00] <claime>	 !log Restarting ferm.service on k8s node mw1424 - T354855
[12:16:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 40755103920 and 3947 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:16:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:16:06] <wikibugs>	 (03PS1) 10Kosta Harlan: ipoid: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/995197 (https://phabricator.wikimedia.org/T351430)
[12:16:21] <stashbot>	 T354855: ferm sometimes fails to restart on Kubernetes workers via xtables lock held by kube-proxy - https://phabricator.wikimedia.org/T354855
[12:16:36] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+2] ipoid: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/995197 (https://phabricator.wikimedia.org/T351430) (owner: 10Kosta Harlan)
[12:16:53] <icinga-wm>	 RECOVERY - Check systemd state on mw1424 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:17:35] <wikibugs>	 (03Merged) 10jenkins-bot: ipoid: Bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/995197 (https://phabricator.wikimedia.org/T351430) (owner: 10Kosta Harlan)
[12:18:10] <logmsgbot>	 !log kharlan@deploy2002 helmfile [staging] START helmfile.d/services/ipoid: apply
[12:18:47] <logmsgbot>	 !log kharlan@deploy2002 helmfile [staging] DONE helmfile.d/services/ipoid: apply
[12:20:26] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
[12:21:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P56113 and previous config saved to /var/cache/conftool/dbconfig/20240202-122120-root.json
[12:22:43] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[12:22:57] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[12:26:30] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
[12:29:09] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1488.eqiad.wmnet with OS bullseye
[12:30:13] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 54200726040 and 4797 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:31:40] <wikibugs>	 10SRE, 10MW-on-K8s, 10Trust and Safety Product Team, 10serviceops-radar, 10Patch-For-Review: MediaModeration maintenance script scanFilesInScanTable.php indirectly calls $wgImageMagickConvertCommand - https://phabricator.wikimedia.org/T355243 (10Dreamy_Jazz) 05In progress→03Resolved I'm going to mark...
[12:31:50] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1419.eqiad.wmnet with OS bullseye
[12:31:53] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10Dreamy_Jazz)
[12:35:55] <logmsgbot>	 !log cgoubert@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1496.eqiad.wmnet with OS bullseye
[12:36:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P56114 and previous config saved to /var/cache/conftool/dbconfig/20240202-123625-root.json
[12:38:49] <Lucas_WMDE>	 quick question before I go for lunch: is it okay to run a maintenance script (specifically, namespaceDupes --fix) on a no-deploys Friday? :)
[12:39:19] <claime>	 What's the expected impact?
[12:39:33] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[12:39:36] <Lucas_WMDE>	 two currently-unreachable pages are moved to reachable titles
[12:39:56] <Lucas_WMDE>	 (enwikiquote has two pages with page_title like Wq:%, but Wq: is now an alias for the project namespace)
[12:40:16] <Lucas_WMDE>	 it’s not urgent, but on the other hand, the maintenance script should be quite safe IMHO, so I’d like to get it over with if it’s okay :)
[12:40:34] * Lucas_WMDE afk, will read responses later
[12:42:23] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
[12:42:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
[12:42:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56115 and previous config saved to /var/cache/conftool/dbconfig/20240202-124243-marostegui.json
[12:43:05] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[12:43:17] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1424 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[12:43:45] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[12:45:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
[12:49:21] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
[12:50:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
[12:50:53] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 41841030016 and 1995 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:51:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P56116 and previous config saved to /var/cache/conftool/dbconfig/20240202-125131-root.json
[12:53:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Readd Arturo to ops group [puppet] - 10https://gerrit.wikimedia.org/r/995199 (https://phabricator.wikimedia.org/T356403)
[12:54:15] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
[12:55:11] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/995199 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[12:55:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[12:56:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Readd Arturo to ops group [puppet] - 10https://gerrit.wikimedia.org/r/995199 (https://phabricator.wikimedia.org/T356403) (owner: 10Muehlenhoff)
[12:58:07] <wikibugs>	 10SRE, 10All-and-every-Wikisource, 10Product-Analytics, 10Bengali-Sites, 10SEO: Google not indexing Wikisource properly for years - https://phabricator.wikimedia.org/T325607 (10Soda) >>! In T325607#9440813, @SCherukuwada wrote: > Here is a summary of our discussions with Google (they proofread this summa...
[12:58:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
[13:02:35] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
[13:04:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host sretest2004.codfw.wmnet
[13:07:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56117 and previous config saved to /var/cache/conftool/dbconfig/20240202-130717-marostegui.json
[13:07:33] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[13:09:44] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2004.codfw.wmnet
[13:16:38] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[13:17:01] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
[13:18:05] <moritzm>	 !log installing Linux 4.19.304 updates on Buster hosts
[13:18:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2472 and 254 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:20:59] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
[13:22:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56118 and previous config saved to /var/cache/conftool/dbconfig/20240202-132224-marostegui.json
[13:26:03] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 7432 and 652 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:27:03] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 613128 and 714 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:27:35] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 224584 and 745 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:28:07] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1110384 and 778 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[13:28:38] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Connection errors to some hosts from cumin1002 - https://phabricator.wikimedia.org/T356174 (10MoritzMuehlenhoff)
[13:37:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56119 and previous config saved to /var/cache/conftool/dbconfig/20240202-133730-marostegui.json
[13:52:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56120 and previous config saved to /var/cache/conftool/dbconfig/20240202-135237-marostegui.json
[13:52:40] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
[13:52:53] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
[13:52:54] <jynus>	 !log INFO: About to transfer /srv/backups/snapshots/latest/snapshot.s1.2024-02-02--09-03-48.tar.gz from dbprov1001.eqiad.wmnet to ['db1239.eqiad.wmnet']:['/srv/sqldata.s1'] (478462104090 bytes)
[13:53:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56121 and previous config saved to /var/cache/conftool/dbconfig/20240202-135300-marostegui.json
[13:53:03] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[13:53:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:33] <icinga-wm>	 RECOVERY - Check systemd state on cloudlb2003-dev is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:56:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:56:26] <jynus>	 !log INFO: About to transfer /srv/backups/snapshots/latest/snapshot.s4.2024-02-02--09-03-48.tar.gz from dbprov1003.eqiad.wmnet to ['db1245.eqiad.wmnet']:['/srv/sqldata.s4'] (575311400085 bytes)
[13:56:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:02:41] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm, let me know when this should be merged so we can sync it with the buildkit rebuild. I don't like the solution of switching on Docker" [puppet] - 10https://gerrit.wikimedia.org/r/995103 (https://phabricator.wikimedia.org/T356418) (owner: 10Ahmon Dancy)
[14:06:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:14:17] <wikibugs>	 (03PS1) 10Muehlenhoff: ferm::filter_log: Make ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174)
[14:15:26] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ferm::filter_log: Make ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174) (owner: 10Muehlenhoff)
[14:16:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56122 and previous config saved to /var/cache/conftool/dbconfig/20240202-141632-marostegui.json
[14:16:54] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[14:18:30] * Lucas_WMDE back btw
[14:19:51] <wikibugs>	 (03PS2) 10Muehlenhoff: ferm::filter_log: Make ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174)
[14:21:00] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ferm::filter_log: Make ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174) (owner: 10Muehlenhoff)
[14:21:23] <urandom>	 !log decommissioning cassandra, restbase2017-{a,b,c} — T352469
[14:21:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:27] <stashbot>	 T352469: Decommission restbase20[13-20]) - https://phabricator.wikimedia.org/T352469
[14:23:19] <icinga-wm>	 PROBLEM - Disk space on build2001 is CRITICAL: DISK CRITICAL - free space: / 12998 MB (5% inode=65%): /tmp 12998 MB (5% inode=65%): /var/tmp 12998 MB (5% inode=65%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=build2001&var-datasource=codfw+prometheus/ops
[14:24:53] <wikibugs>	 (03PS3) 10Muehlenhoff: ferm::filter_log: Make ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174)
[14:30:15] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995211 (https://phabricator.wikimedia.org/T356174) (owner: 10Muehlenhoff)
[14:31:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56123 and previous config saved to /var/cache/conftool/dbconfig/20240202-143139-marostegui.json
[14:35:58] <wikibugs>	 (03PS1) 10Muehlenhoff: ulogd: Make class ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995213 (https://phabricator.wikimedia.org/T356174)
[14:37:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ulogd: Make class ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995213 (https://phabricator.wikimedia.org/T356174) (owner: 10Muehlenhoff)
[14:39:30] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:40:04] <wikibugs>	 (03PS2) 10Muehlenhoff: ulogd: Make class ensurable [puppet] - 10https://gerrit.wikimedia.org/r/995213 (https://phabricator.wikimedia.org/T356174)
[14:42:14] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995213 (https://phabricator.wikimedia.org/T356174) (owner: 10Muehlenhoff)
[14:46:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56125 and previous config saved to /var/cache/conftool/dbconfig/20240202-144648-marostegui.json
[14:54:39] <wikibugs>	 10SRE, 10Traffic-Icebox: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365 (10ssingh) Following some discussion this week, @Bblack and I decided to revisit this task and provide an update to some of the concerns above, in the hope of providing a path to lowering the T...
[14:55:09] <wikibugs>	 10SRE, 10Traffic: Lower geodns TTLs from 600 (10min) to 300 (5min) - https://phabricator.wikimedia.org/T140365 (10ssingh)
[14:57:17] <wikibugs>	 10SRE-Access-Requests, 10Data-Platform-SRE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279 (10Gehel)
[14:59:30] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:01:55] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56126 and previous config saved to /var/cache/conftool/dbconfig/20240202-150155-marostegui.json
[15:01:58] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
[15:02:12] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
[15:02:14] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:02:17] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[15:02:30] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[15:02:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56127 and previous config saved to /var/cache/conftool/dbconfig/20240202-150236-marostegui.json
[15:27:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56128 and previous config saved to /var/cache/conftool/dbconfig/20240202-152707-marostegui.json
[15:27:12] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[15:42:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56130 and previous config saved to /var/cache/conftool/dbconfig/20240202-154214-marostegui.json
[15:48:40] <wikibugs>	 (03PS1) 10Bking: cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617)
[15:49:21] <icinga-wm>	 PROBLEM - Check systemd state on phab2002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-phabricator-repos.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:49:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[15:50:53] <icinga-wm>	 RECOVERY - Check systemd state on phab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:51:16] <wikibugs>	 (03PS2) 10Bking: cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617)
[15:52:00] <logmsgbot>	 !log cgoubert@cumin2002 conftool action : set/pooled=yes; selector: name=mw1494.eqiad.wmnet,cluster=jobrunner
[15:52:21] <wikibugs>	 (03PS3) 10Bking: cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617)
[15:52:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[15:52:53] <wikibugs>	 (03PS4) 10Bking: cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617)
[15:56:57] <wikibugs>	 (03CR) 10Scott French: P:httpbb: migrate tests from cumin1001 to cumin1002 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/995108 (https://phabricator.wikimedia.org/T356054) (owner: 10Scott French)
[15:57:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56131 and previous config saved to /var/cache/conftool/dbconfig/20240202-155721-marostegui.json
[15:57:37] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[16:01:00] <wikibugs>	 (03CR) 10Ahmon Dancy: "Please merge ASAP.  I'm blocked in the meantime." [puppet] - 10https://gerrit.wikimedia.org/r/995103 (https://phabricator.wikimedia.org/T356418) (owner: 10Ahmon Dancy)
[16:12:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56132 and previous config saved to /var/cache/conftool/dbconfig/20240202-161227-marostegui.json
[16:12:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
[16:12:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
[16:12:49] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[16:12:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56133 and previous config saved to /var/cache/conftool/dbconfig/20240202-161249-marostegui.json
[16:26:53] <wikibugs>	 (03PS5) 10Bking: cloudelastic: Begin private IP migration for cloudelastic1009 [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617)
[16:27:15] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995223 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[16:39:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56135 and previous config saved to /var/cache/conftool/dbconfig/20240202-163950-marostegui.json
[16:40:06] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[16:54:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P56136 and previous config saved to /var/cache/conftool/dbconfig/20240202-165457-marostegui.json
[17:10:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P56137 and previous config saved to /var/cache/conftool/dbconfig/20240202-171003-marostegui.json
[17:23:53] <wikibugs>	 (03PS1) 10Btullis: Add amastilovic to the analytics-admin group [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607)
[17:24:23] <wikibugs>	 (03PS2) 10Btullis: Add amastilovic to the analytics-admin group [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607)
[17:25:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56138 and previous config saved to /var/cache/conftool/dbconfig/20240202-172510-marostegui.json
[17:25:12] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
[17:25:26] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
[17:25:28] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[17:25:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56139 and previous config saved to /var/cache/conftool/dbconfig/20240202-172532-marostegui.json
[17:26:02] <wikibugs>	 (03PS3) 10Btullis: Add amastilovic to the analytics-admins group [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607)
[17:27:03] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1272/co" [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607) (owner: 10Btullis)
[17:33:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607) (owner: 10Btullis)
[17:33:58] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add amastilovic to the analytics-admins group [puppet] - 10https://gerrit.wikimedia.org/r/995270 (https://phabricator.wikimedia.org/T355607) (owner: 10Btullis)
[17:38:06] <jinxer-wm>	 (MediaWikiEditFailures) firing: (2) Elevated MediaWiki edit failures (session_loss) for cluster appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[17:49:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56140 and previous config saved to /var/cache/conftool/dbconfig/20240202-174929-marostegui.json
[17:49:45] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[17:58:06] <jinxer-wm>	 (MediaWikiEditFailures) resolved: (2) Elevated MediaWiki edit failures (session_loss) for cluster appserver - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[18:04:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56141 and previous config saved to /var/cache/conftool/dbconfig/20240202-180435-marostegui.json
[18:15:15] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] Temporarily enable Dockerfile frontend on trusted runners [puppet] - 10https://gerrit.wikimedia.org/r/995103 (https://phabricator.wikimedia.org/T356418) (owner: 10Ahmon Dancy)
[18:19:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56142 and previous config saved to /var/cache/conftool/dbconfig/20240202-181941-marostegui.json
[18:22:17] <icinga-wm>	 PROBLEM - Check systemd state on gitlab-runner2003 is CRITICAL: CRITICAL - degraded: The following units failed: docker-gc.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:23:15] <icinga-wm>	 PROBLEM - Check systemd state on gitlab-runner2004 is CRITICAL: CRITICAL - degraded: The following units failed: docker-gc.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:24:47] <icinga-wm>	 RECOVERY - Check systemd state on gitlab-runner2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:25:21] <icinga-wm>	 RECOVERY - Check systemd state on gitlab-runner2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:34:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56143 and previous config saved to /var/cache/conftool/dbconfig/20240202-183448-marostegui.json
[18:34:50] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
[18:35:04] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
[18:35:06] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[18:35:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56144 and previous config saved to /var/cache/conftool/dbconfig/20240202-183510-marostegui.json
[18:58:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56145 and previous config saved to /var/cache/conftool/dbconfig/20240202-185818-marostegui.json
[18:58:39] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[19:06:35] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[19:07:59] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 2.754 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[19:13:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56146 and previous config saved to /var/cache/conftool/dbconfig/20240202-191325-marostegui.json
[19:23:51] <wikibugs>	 (03PS1) 10Jforrester: specials: Remove null comments from formatter on Special:ProtectedPages [core] (wmf/1.42.0-wmf.16) - 10https://gerrit.wikimedia.org/r/995232 (https://phabricator.wikimedia.org/T356337)
[19:26:50] <wikibugs>	 (03PS4) 10BCornwall: fifo-log-demux: Decouple service from nginx/ats [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905)
[19:28:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56147 and previous config saved to /var/cache/conftool/dbconfig/20240202-192831-marostegui.json
[19:29:01] <wikibugs>	 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1106.eqiad.wmnet - https://phabricator.wikimedia.org/T327616 (10VRiley-WMF) a:03VRiley-WMF
[19:32:56] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1273/co" [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905) (owner: 10BCornwall)
[19:35:59] <wikibugs>	 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1106.eqiad.wmnet - https://phabricator.wikimedia.org/T327616 (10VRiley-WMF) this device has been removed and decommissioned
[19:36:38] <wikibugs>	 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1106.eqiad.wmnet - https://phabricator.wikimedia.org/T327616 (10VRiley-WMF) 05Open→03Resolved
[19:41:31] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] fifo-log-demux: Decouple service from nginx/ats (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/993804 (https://phabricator.wikimedia.org/T355905) (owner: 10BCornwall)
[19:43:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56148 and previous config saved to /var/cache/conftool/dbconfig/20240202-194338-marostegui.json
[19:43:40] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
[19:43:41] <icinga-wm>	 PROBLEM - CirrusSearch more_like codfw 95th percentile latency on graphite1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=39
[19:43:53] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
[19:43:54] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[19:44:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56149 and previous config saved to /var/cache/conftool/dbconfig/20240202-194359-marostegui.json
[19:46:51] <icinga-wm>	 RECOVERY - CirrusSearch more_like codfw 95th percentile latency on graphite1005 is OK: OK: Less than 20.00% above the threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=codfw&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&viewPanel=39
[19:47:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] specials: Remove null comments from formatter on Special:ProtectedPages [core] (wmf/1.42.0-wmf.16) - 10https://gerrit.wikimedia.org/r/995232 (https://phabricator.wikimedia.org/T356337) (owner: 10Jforrester)
[19:55:18] <wikibugs>	 (03CR) 10Umherirrender: "recheck" [core] (wmf/1.42.0-wmf.16) - 10https://gerrit.wikimedia.org/r/995232 (https://phabricator.wikimedia.org/T356337) (owner: 10Jforrester)
[20:01:52] <wikibugs>	 10SRE-swift-storage, 10Commons: Deleted file disappeared on Commons - https://phabricator.wikimedia.org/T356535 (10Pppery)
[20:12:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56150 and previous config saved to /var/cache/conftool/dbconfig/20240202-201202-marostegui.json
[20:12:20] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[20:27:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P56151 and previous config saved to /var/cache/conftool/dbconfig/20240202-202709-marostegui.json
[20:42:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P56152 and previous config saved to /var/cache/conftool/dbconfig/20240202-204215-marostegui.json
[20:51:48] <wikibugs>	 (03PS7) 10Bking: sre.hosts.reimage: Suggest install-console for troubleshooting [cookbooks] - 10https://gerrit.wikimedia.org/r/956082 (https://phabricator.wikimedia.org/T345778)
[20:57:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56153 and previous config saved to /var/cache/conftool/dbconfig/20240202-205722-marostegui.json
[20:57:25] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
[20:57:27] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
[20:57:31] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[21:05:04] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence, 10Thumbor, and 2 others: Changing default image thumbnail size on English Wikipedia - https://phabricator.wikimedia.org/T355914 (10Redrose64) >>! In T355914#9501705, @Joe wrote: > Given the chosen size is both non-standard (meaning it's not used on most large...
[21:13:50] <wikibugs>	 10SRE-swift-storage, 10Commons: Deleted file disappeared on Commons - https://phabricator.wikimedia.org/T356535 (10Rosenzweig) That file was deleted on May 24, 2006, I think before deleted files were kept on the server by default (first introduced with MediaWiki 1.7 in July 2006, see https://www.mediawiki.org/...
[21:15:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
[21:16:00] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
[21:20:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[21:25:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[21:34:45] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
[21:34:58] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
[21:35:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56154 and previous config saved to /var/cache/conftool/dbconfig/20240202-213504-marostegui.json
[21:35:29] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[21:52:37] <wikibugs>	 10SRE-swift-storage, 10Commons: Deleted file disappeared on Commons - https://phabricator.wikimedia.org/T356535 (10Yann) 05Open→03Invalid Nevermind. I just learnt that deleted files were not kept at that time.
[21:58:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56155 and previous config saved to /var/cache/conftool/dbconfig/20240202-215815-marostegui.json
[21:58:35] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[22:00:07] <wikibugs>	 (03PS1) 10Ahmon Dancy: Temporarily enable Dockerfile frontend on trusted runners (part 2) [puppet] - 10https://gerrit.wikimedia.org/r/995343 (https://phabricator.wikimedia.org/T356418)
[22:00:47] <wikibugs>	 (03CR) 10Ahmon Dancy: "Another bit needed. :-/" [puppet] - 10https://gerrit.wikimedia.org/r/995343 (https://phabricator.wikimedia.org/T356418) (owner: 10Ahmon Dancy)
[22:05:05] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[22:10:16] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[22:13:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56156 and previous config saved to /var/cache/conftool/dbconfig/20240202-221321-marostegui.json
[22:14:15] <jinxer-wm>	 (MediaWikiHighErrorRate) firing: Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[22:19:15] <jinxer-wm>	 (MediaWikiHighErrorRate) resolved: (2) Elevated rate of MediaWiki errors - kube-mw-jobrunner - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[22:28:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56157 and previous config saved to /var/cache/conftool/dbconfig/20240202-222828-marostegui.json
[22:32:57] <wikibugs>	 (03PS1) 10Dzahn: cloud/devtools: update phabricator domain and altdomain [puppet] - 10https://gerrit.wikimedia.org/r/995366 (https://phabricator.wikimedia.org/T356530)
[22:38:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] cloud/devtools: update phabricator domain and altdomain [puppet] - 10https://gerrit.wikimedia.org/r/995366 (https://phabricator.wikimedia.org/T356530) (owner: 10Dzahn)
[22:43:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56158 and previous config saved to /var/cache/conftool/dbconfig/20240202-224334-marostegui.json
[22:43:37] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
[22:43:51] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
[22:43:52] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[22:43:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56159 and previous config saved to /var/cache/conftool/dbconfig/20240202-224357-marostegui.json
[23:13:51] <Goku2024>	 Hello, it was verified with checkuser that I do not have a puppet, can someone remove the current block that I have on Wikipedia that year without having done anything????? my account is GokuJuan
[23:13:51] <Goku2024>	 Blockade Evasion Puppet
[23:13:52] <Goku2024>	 List of users involved
[23:13:52] <Goku2024>	 * GokuJuan (disc. · password · reg. (locks) · CentralAuth · luxo)
[23:13:53] <Goku2024>	 * Darth.callel (disc. · password · reg. (locks) · CentralAuth · luxo)
[23:13:53] <Goku2024>	 Motivation
[23:13:54] <Goku2024>	 Explain here the reasons that justify your decision. Use diffs like:
[23:13:54] <Goku2024>	 1. GokuJuan has created the article Dragon Ball: Sparking! Zero, but has shown itself incapable of accepting third-party editions or corrections that have been made due to errors in the style manual and information management, resulting in poor coexistence and lack of etiquette that currently earned it a block from one month.
[23:13:55] <Goku2024>	 2. Immediately after starting his sanction, the Darth.callel account was created, whose only contributions have been to restore GokuJuan's editions, support his actions, make pejorative comments towards Wikipedia and publish compliments towards him and his editions on the discussion pages.
[23:13:55] <Goku2024>	 Signed: Oniichan (talk) 01:05 12 Jan 2024 (UTC)[reply]
[23:13:56] <Goku2024>	 Resolution
[23:13:56] <Goku2024>	  Likely. The technical data does not match, however I understand why the library that blocked the alleged puppet suspects that it is a GokuJuan puppet. --- Fought
[23:13:57] <Goku2024>	  Problem? 01:59 17 Jan 2024 (UTC)[reply]
[23:13:57] <Goku2024>	 I can't even send emails to fix things because it seems like I'm blocked and I can't write on my discussion page either.
[23:13:58] <Goku2024>	 They gave me 1 month of blocking without explaining anything to me about the format of a table and then they invented that I had a puppet and gave me another month
[23:15:02] <Goku2024>	 Now is better redacted
[23:15:08] <Goku2024>	 Hello, it was verified with checkuser that I do not have a puppet, can someone remove the current block that I have on Spanish Wikipedia without having done anything????? my account is GokuJuan
[23:15:08] <Goku2024>	 Blockade Evasion Puppet
[23:15:09] <Goku2024>	 List of users involved
[23:15:09] <Goku2024>	 * GokuJuan (disc. · password · reg. (locks) · CentralAuth · luxo)
[23:15:10] <Goku2024>	 * Darth.callel (disc. · password · reg. (locks) · CentralAuth · luxo)
[23:15:10] <Goku2024>	 Motivation
[23:15:11] <Goku2024>	 Explain here the reasons that justify your decision. Use diffs like:
[23:15:11] <Goku2024>	 1. GokuJuan has created the article Dragon Ball: Sparking! Zero, but has shown itself incapable of accepting third-party editions or corrections that have been made due to errors in the style manual and information management, resulting in poor coexistence and lack of etiquette that currently earned it a block from one month.
[23:15:12] <Goku2024>	 2. Immediately after starting his sanction, the Darth.callel account was created, whose only contributions have been to restore GokuJuan's editions, support his actions, make pejorative comments towards Wikipedia and publish compliments towards him and his editions on the discussion pages.
[23:15:12] <Goku2024>	 Signed: Oniichan (talk) 01:05 12 Jan 2024 (UTC)[reply]
[23:15:13] <Goku2024>	 Resolution
[23:15:13] <Goku2024>	  Likely. The technical data does not match, however I understand why the library that blocked the alleged puppet suspects that it is a GokuJuan puppet. --- Fought
[23:15:14] <Goku2024>	  Problem? 01:59 17 Jan 2024 (UTC)[reply]
[23:15:14] <Goku2024>	 I can't even send emails to fix things because it seems like I'm blocked and I can't write on my discussion page either.
[23:15:15] <Goku2024>	 They gave me 1 month of blocking without explaining anything to me about the format of a table and then they invented that I had a puppet and gave me another month
[23:15:19] <bd808>	 Goku2024: this channel is for discussing the operation of the servers powering the Wikimedia projects. We are not the right folks to petition for on-wiki account changes.
[23:15:52] <Goku2024>	 The thing is that they banned me for asking for help on Wikipedia in Spanish and English. The same person banned me for asking for help in my article.
[23:16:13] <Goku2024>	 I do not know what else to do
[23:16:30] <Goku2024>	 Wikipedia english and spanish irc*
[23:16:31] <bd808>	 That sounds frustrating, but also wildly off-topic here.
[23:16:40] <Goku2024>	 Ok thank u man
[23:17:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56160 and previous config saved to /var/cache/conftool/dbconfig/20240202-231732-marostegui.json
[23:17:45] <stashbot>	 T355609: Make cuc_id a bigint - https://phabricator.wikimedia.org/T355609
[23:24:59] <wikibugs>	 (03PS1) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[23:29:46] <wikibugs>	 (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[23:32:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56161 and previous config saved to /var/cache/conftool/dbconfig/20240202-233239-marostegui.json
[23:39:59] <wikibugs>	 (03PS2) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[23:41:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995) (owner: 10Andrew Bogott)
[23:45:26] <wikibugs>	 (03PS3) 10Andrew Bogott: OpenStack Designate: move from cloudservices to cloudcontrols in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/995369 (https://phabricator.wikimedia.org/T350995)
[23:47:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56162 and previous config saved to /var/cache/conftool/dbconfig/20240202-234745-marostegui.json