[00:00:23] RECOVERY - Check systemd state on puppetmaster1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:11:07] PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [00:11:43] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:12:29] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:14:57] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 3.947 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:16:41] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48391 bytes in 0.129 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [00:43:37] RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:37:45] (JobUnavailable) firing: Reduced availability for job workhorse in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:42:45] (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:47:11] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row B new PDUs - https://phabricator.wikimedia.org/T310070 (10Papaul) [01:47:45] (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:50:44] 10SRE, 10ops-codfw: codfw: Master PDU rack/setup row A, row B, rowC and row D task - https://phabricator.wikimedia.org/T309956 (10Papaul) [01:51:50] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10Papaul) [01:52:45] (JobUnavailable) resolved: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:41:53] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [02:44:29] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [04:20:47] (03CR) 10Abijeet Patro: [C: 03+1] ReviewTranslationActionApi: Move to namespace and add strict types [extensions/Translate] (wmf/1.39.0-wmf.21) - 10https://gerrit.wikimedia.org/r/816272 (https://phabricator.wikimedia.org/T312008) (owner: 10Abijeet Patro) [05:18:47] RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:28:34] 10SRE, 10Infrastructure-Foundations, 10netops: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10ayounsi) [06:30:14] 10SRE, 10Infrastructure-Foundations, 10netops: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10ayounsi) [06:30:36] !log power off asw2-d5-eqiad for decommissioning - T313115 [06:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:43] T313115: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 [06:36:45] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A few minor tweaks and then LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/816018 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall) [06:37:40] 10SRE, 10Infrastructure-Foundations, 10netops: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10ayounsi) [06:38:45] (virtual-chassis crash) firing: Alert for device asw2-d-eqiad.mgmt.eqiad.wmnet - virtual-chassis crash - https://alerts.wikimedia.org/?q=alertname%3Dvirtual-chassis+crash [06:39:05] that's expected ^ [06:39:27] at least this new alert works [06:40:07] 10SRE, 10Infrastructure-Foundations, 10netops: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10ayounsi) [06:42:52] 10SRE, 10ops-eqiad, 10Infrastructure-Foundations: Move asw2-d5-eqiad to spares - https://phabricator.wikimedia.org/T313115 (10ayounsi) a:05ayounsi→03Cmjohnson [06:47:05] RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 135, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [06:48:45] (virtual-chassis crash) resolved: Device asw2-d-eqiad.mgmt.eqiad.wmnet recovered from virtual-chassis crash - https://alerts.wikimedia.org/?q=alertname%3Dvirtual-chassis+crash [07:00:04] Amir1 and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T0700). [07:00:05] kart_, physikerwelt, koi, and abijeet: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:43] RECOVERY - Check systemd state on phab1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:03:12] Sorry. Had trouble joining IRC. [07:04:10] greeting [07:04:27] Who is deploying? I can self deploy my change. [07:04:57] (03PS2) 10KartikMistry: Enable Section Translation in Uzbek Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/815829 (https://phabricator.wikimedia.org/T310116) [07:06:23] kart_ see https://wikitech.wikimedia.org/wiki/Deployments [07:08:32] physikerwelt: I joined few minutes late, so not sure who is available to Deploy. Just wanted to avoid collision :) [07:08:55] seems no one here, you could go ahead [07:09:05] koi: OK! [07:09:21] (03CR) 10KartikMistry: [C: 03+2] Enable Section Translation in Uzbek Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/815829 (https://phabricator.wikimedia.org/T310116) (owner: 10KartikMistry) [07:10:31] (03Merged) 10jenkins-bot: Enable Section Translation in Uzbek Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/815829 (https://phabricator.wikimedia.org/T310116) (owner: 10KartikMistry) [07:11:37] (03PS2) 10Physikerwelt: Explicitly set math rendering modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802443 (https://phabricator.wikimedia.org/T309686) [07:12:14] 10SRE-OnFire, 10observability, 10SRE Observability (FY2022/2023-Q1): Business hours oncall implementation delays pages to batphone by 5 minutes when there are no oncallers - https://phabricator.wikimedia.org/T313603 (10Volans) Are the people oncall supposed to switch this twice everyday? I've changed it now... [07:12:47] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [07:13:01] * volans looking [07:14:01] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:14:57] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:14:58] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:15:11] kart_ are you done? [07:15:57] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:16:11] !log kartik@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815829|Enable Section Translation in Uzbek Wikipedia (T310116)]] (duration: 03m 04s) [07:16:14] T310116: Enable Section Translation in Uzbek Wikipedia - https://phabricator.wikimedia.org/T310116 [07:16:51] physikerwelt: yes. [07:17:14] could you also deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/802443 ? [07:18:32] physikerwelt: sure [07:18:49] thank you [07:19:40] (03CR) 10KartikMistry: [C: 03+2] "UTC morning backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802443 (https://phabricator.wikimedia.org/T309686) (owner: 10Physikerwelt) [07:20:12] I've to go to Lunch+Meeting after above deploy ^ [07:20:43] (03Merged) 10jenkins-bot: Explicitly set math rendering modes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802443 (https://phabricator.wikimedia.org/T309686) (owner: 10Physikerwelt) [07:22:56] thank you again kart_ [07:23:33] physikerwelt: Please test on mwdebug1001 [07:23:41] and let me know if everything is OK! [07:23:51] !log volans@cumin2002 START - Cookbook sre.dns.netbox [07:25:11] urbanecm, Amir1, I'm around for the UTC morning backport window deployment [07:25:35] kart_ I installed  the WikimediaDebug extension but I am not entirely sure how to test [07:26:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:26:22] physikerwelt: Click on Icon and set mwdebug1001 from dropdown and test if change is possible to test via browser. [07:27:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:27:08] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:27:38] abijeet: thanks [07:28:01] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:28:05] Amir1, o/ [07:28:40] o/ [07:30:19] 10SRE, 10ops-eqiad, 10decommission-hardware: decommission frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T313607 (10Volans) I've run the [[ https://wikitech.wikimedia.org/wiki/DNS/Netbox#Update_generated_records | sre.dns.netbox ]] as Icinga was alerting for `PROBLEM - Uncommitted DNS changes... [07:30:21] physikerwelt: Done? [07:31:04] I cklicked on the button but I still do not understand what to do next [07:31:19] do I need to visit a specific URL? [07:31:26] !log volans@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [07:31:38] physikerwelt: How did you test proposed change during patch submitted? [07:32:50] the patch does not do anything. It just overwrites the default config from the extension with the same value now hardcoded [07:33:05] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:33:19] Oh, I see. Then, let's deploy it. If something breaks, we will get tshirt(s). [07:33:28] I see Amir1 has +1 on it. [07:34:41] physikerwelt: Deploying.. [07:36:53] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:36:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:37:37] !log kartik@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:802443|Explicitly set math rendering modes (T309686)]] (duration: 03m 11s) [07:37:41] T309686: Create mathlatexml table? - https://phabricator.wikimedia.org/T309686 [07:38:15] Amir1, sorry if I was not clear earlier. I have a patch waiting for deployment during this window. [07:38:42] abijeet: you can self-serve I assume or kart_ can handle it [07:38:58] abijeet: OK. Let me +2 on it first and deploy. [07:39:36] abijeet: some test failures. Is that OK? [07:40:29] um, is there anyone willing to deploy my patch [07:40:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:44:02] koi: the third one needs more work, installing an extension is not super straightforward but the rest is doable. I just need to check the community consensus [07:45:49] kart_, yea, thats fine. [07:45:51] and the zhwiki ones doesn't show a consensus [07:46:32] (03CR) 10Ladsgroup: [C: 04-1] "The community consensus has not been reached yet," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816239 (https://phabricator.wikimedia.org/T313657) (owner: 10Stang) [07:46:59] (03CR) 10Ladsgroup: "note to the deployer: installing this extension also requires creating its tables." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816316 (https://phabricator.wikimedia.org/T313173) (owner: 10Stang) [07:49:06] for T313657, small wikis are pretty hard to get several support, and IMO posted at VP for 7 days w/o objection could be considered as a reach of consensus [07:49:06] T313657: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) - https://phabricator.wikimedia.org/T313657 [07:49:32] (03CR) 10KartikMistry: [C: 03+2] ReviewTranslationActionApi: Move to namespace and add strict types [extensions/Translate] (wmf/1.39.0-wmf.21) - 10https://gerrit.wikimedia.org/r/816272 (https://phabricator.wikimedia.org/T312008) (owner: 10Abijeet Patro) [07:49:45] abijeet: Sorry. A little late! [07:50:16] kart_, no problem. I responded late too :) [07:52:03] RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [07:56:57] (03CR) 10Stang: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816239 (https://phabricator.wikimedia.org/T313657) (owner: 10Stang) [08:00:35] abijeet: Patch is still in CI. Few more minutes.. [08:01:49] ok [08:06:18] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] services_proxy: Move AF common stanza to separate template [puppet] - 10https://gerrit.wikimedia.org/r/815957 (owner: 10Alexandros Kosiaris) [08:06:38] (03Merged) 10jenkins-bot: ReviewTranslationActionApi: Move to namespace and add strict types [extensions/Translate] (wmf/1.39.0-wmf.21) - 10https://gerrit.wikimedia.org/r/816272 (https://phabricator.wikimedia.org/T312008) (owner: 10Abijeet Patro) [08:08:12] 10SRE, 10SRE-Access-Requests: Requesting access to the Desktop Improvements project statistics for SGrabarczuk - https://phabricator.wikimedia.org/T313616 (10Vgutierrez) p:05Triage→03Medium a:03Vgutierrez Waiting for @Elitre approval on the manager side and @Ottomata || @odimitrijevic per analytics-priva... [08:08:17] abijeet: Please test on mwdebug1001 [08:09:42] okj [08:11:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [08:11:47] kart_, looks good [08:12:12] abijeet: cool [08:13:50] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [08:13:51] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [08:15:34] (03CR) 10Volans: [C: 03+2] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/816213 (https://phabricator.wikimedia.org/T312220) (owner: 10Dzahn) [08:15:41] !log kartik@deploy1002 Synchronized php-1.39.0-wmf.21/extensions/Translate: Backport: [[gerrit:816272|ReviewTranslationActionApi: Move to namespace and add strict types (T312008 T313608)]] (duration: 03m 09s) [08:15:47] T312008: Move classes under api folder to namespace - https://phabricator.wikimedia.org/T312008 [08:15:47] T313608: Marking translations as reviewed fails - https://phabricator.wikimedia.org/T313608 [08:16:02] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: Maint done', diff saved to https://phabricator.wikimedia.org/P31816 and previous config saved to /var/cache/conftool/dbconfig/20220725-081601-ladsgroup.json [08:17:32] (03CR) 10Filippo Giunchedi: prometheus: update blackbox check alerts runbook link (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/816135 (https://phabricator.wikimedia.org/T312947) (owner: 10Filippo Giunchedi) [08:17:37] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [08:18:30] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to LDAP wmf group for Aline Bruenger WMDE - https://phabricator.wikimedia.org/T312220 (10Volans) a:05Joe→03None @Aline_Bruenger_WMDE I've added your uses to the LDAP groups `wmde` and `nda`. You should be good to go now. Could you please con... [08:18:32] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] prometheus::php_fpm_exporter: absent in production [puppet] - 10https://gerrit.wikimedia.org/r/815990 (https://phabricator.wikimedia.org/T313505) (owner: 10Giuseppe Lavagetto) [08:19:58] (03CR) 10Filippo Giunchedi: [C: 03+2] sre: link to service-specific Runbook wikitech page [alerts] - 10https://gerrit.wikimedia.org/r/816136 (https://phabricator.wikimedia.org/T312947) (owner: 10Filippo Giunchedi) [08:20:02] (03PS2) 10Filippo Giunchedi: sre: link to service-specific Runbook wikitech page [alerts] - 10https://gerrit.wikimedia.org/r/816136 (https://phabricator.wikimedia.org/T312947) [08:20:15] (03PS1) 10Ayounsi: [WIP] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 [08:20:17] (03PS1) 10Ayounsi: Add Python 3.10 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/816702 [08:26:03] (03CR) 10CI reject: [V: 04-1] Add Python 3.10 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/816702 (owner: 10Ayounsi) [08:27:08] (03CR) 10CI reject: [V: 04-1] [WIP] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 (owner: 10Ayounsi) [08:31:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P31817 and previous config saved to /var/cache/conftool/dbconfig/20220725-083105-ladsgroup.json [08:39:40] (03PS1) 10Stang: etwikiquote: Change logo for 10k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816705 (https://phabricator.wikimedia.org/T313698) [08:41:38] (03PS1) 10DCausse: [cirrus] Increase shard count for ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816706 [08:46:09] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P31818 and previous config saved to /var/cache/conftool/dbconfig/20220725-084609-ladsgroup.json [08:47:00] (03CR) 10Mary Yang: "Hi Filippo, Daniel and I have updated this commit according to our discussion on https://phabricator.wikimedia.org/T311457. Does this look" [puppet] - 10https://gerrit.wikimedia.org/r/810146 (https://phabricator.wikimedia.org/T311457) (owner: 10Mary Yang) [08:48:24] 10SRE, 10SRE-swift-storage, 10Performance-Team, 10Traffic, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10fgiunchedi) >>! In T211661#8094929, @ori wrote: > The reason ratelimiting via `tasks_per_second` was introduced (per the [[ https... [08:50:15] 10SRE, 10Campaign-Tools, 10Foundational Technology Requests: [Request for Comment] Campaigns Geolocation API proposal - https://phabricator.wikimedia.org/T312677 (10Joe) AIUI what we want to do is having MediaWiki make a request to an external service. This is already possible as MediaWiki will use our `url... [08:59:45] (JobUnavailable) firing: Reduced availability for job php-fpm in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:01:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P31819 and previous config saved to /var/cache/conftool/dbconfig/20220725-090113-ladsgroup.json [09:05:01] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row B new PDUs - https://phabricator.wikimedia.org/T310070 (10MatthewVernon) Hi, In B2, ms-fe2010 and thanos-fe2002 will need depooling. We need to make sure the ms nodes in Rack `A7` (ms-be2030 ms-be2045 ms-be2052) are all fully OK before starting on rack... [09:05:46] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance [09:06:00] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance [09:06:05] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31820 and previous config saved to /var/cache/conftool/dbconfig/20220725-090604-ladsgroup.json [09:06:09] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [09:08:45] (03PS1) 10Jbond: CONTRIBUTORS: Add Dylsss [puppet] - 10https://gerrit.wikimedia.org/r/816709 (https://phabricator.wikimedia.org/T308013) [09:08:47] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance [09:09:01] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance [09:09:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31821 and previous config saved to /var/cache/conftool/dbconfig/20220725-090906-ladsgroup.json [09:09:38] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10MatthewVernon) I will need to check the state of the swift backends in A7 before it'll be safe to start on B2/4 (but B1/5 have no swift backends in). [09:10:52] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [09:10:55] !log ladsgroup@cumin1001 END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [09:11:01] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [09:11:15] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance [09:11:53] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance [09:12:06] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance [09:12:58] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance [09:13:11] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance [09:13:12] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance [09:13:34] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance [09:14:16] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance [09:14:30] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance [09:14:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31822 and previous config saved to /var/cache/conftool/dbconfig/20220725-091435-ladsgroup.json [09:14:39] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [09:14:45] (JobUnavailable) firing: (2) Reduced availability for job php-fpm in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:16:31] (03CR) 10Jbond: [C: 03+2] CONTRIBUTORS: Add Dylsss [puppet] - 10https://gerrit.wikimedia.org/r/816709 (https://phabricator.wikimedia.org/T308013) (owner: 10Jbond) [09:17:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31823 and previous config saved to /var/cache/conftool/dbconfig/20220725-091717-ladsgroup.json [09:17:21] (03Abandoned) 10Jbond: CONTRIBUTORS: add additional contributors [puppet] - 10https://gerrit.wikimedia.org/r/803247 (owner: 10Jbond) [09:17:21] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance [09:17:35] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance [09:17:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31824 and previous config saved to /var/cache/conftool/dbconfig/20220725-091740-ladsgroup.json [09:19:39] (03PS1) 10Elukey: ml-services: test the first kserve 0.8 Docker images [deployment-charts] - 10https://gerrit.wikimedia.org/r/816710 (https://phabricator.wikimedia.org/T311982) [09:24:34] (03CR) 10Elukey: [C: 03+2] ml-services: test the first kserve 0.8 Docker images [deployment-charts] - 10https://gerrit.wikimedia.org/r/816710 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [09:26:47] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [09:30:49] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [09:32:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31825 and previous config saved to /var/cache/conftool/dbconfig/20220725-093222-ladsgroup.json [09:34:09] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10mfossati) [09:34:47] !log kevinbazira@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [09:46:30] (03PS1) 10Jbond: P:gerrit: Export sshkey for gerrit shared services [puppet] - 10https://gerrit.wikimedia.org/r/816715 (https://phabricator.wikimedia.org/T303857) [09:46:56] (03PS1) 10Elukey: ml-services: update Docker images to KServe 0.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/816716 (https://phabricator.wikimedia.org/T311982) [09:47:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31826 and previous config saved to /var/cache/conftool/dbconfig/20220725-094729-ladsgroup.json [09:49:09] (03CR) 10CI reject: [V: 04-1] ml-services: update Docker images to KServe 0.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/816716 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [09:56:20] (03PS1) 10MVernon: hieradata: make sessionstore2001 a 3.11.13 canary [puppet] - 10https://gerrit.wikimedia.org/r/816719 (https://phabricator.wikimedia.org/T309896) [09:57:41] (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/816719 (https://phabricator.wikimedia.org/T309896) (owner: 10MVernon) [09:59:30] (03PS2) 10Elukey: ml-services: update Docker images to KServe 0.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/816716 (https://phabricator.wikimedia.org/T311982) [10:00:27] PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [10:01:27] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Seddon) Approved, standing in as interim whilst @MarkTraceur is on sabbatical [10:01:32] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] prometheus::php_fpm_exporter: remove from puppet [puppet] - 10https://gerrit.wikimedia.org/r/815991 (https://phabricator.wikimedia.org/T313505) (owner: 10Giuseppe Lavagetto) [10:01:38] (03PS2) 10Giuseppe Lavagetto: prometheus::php_fpm_exporter: remove from puppet [puppet] - 10https://gerrit.wikimedia.org/r/815991 (https://phabricator.wikimedia.org/T313505) [10:02:34] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312863)', diff saved to https://phabricator.wikimedia.org/P31831 and previous config saved to /var/cache/conftool/dbconfig/20220725-100234-ladsgroup.json [10:02:36] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance [10:02:39] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [10:02:50] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance [10:02:55] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31832 and previous config saved to /var/cache/conftool/dbconfig/20220725-100254-ladsgroup.json [10:05:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31833 and previous config saved to /var/cache/conftool/dbconfig/20220725-100538-ladsgroup.json [10:08:02] (03PS1) 10Kevin Bazira: ml-services: Add ar, cs & en wiki articletopic isvcs to prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/816720 (https://phabricator.wikimedia.org/T313307) [10:11:49] (03PS1) 10Ladsgroup: Bump portal to HEAD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816721 [10:15:00] jouncebot: nowandnext [10:15:00] No deployments scheduled for the next 2 hour(s) and 44 minute(s) [10:15:00] In 2 hour(s) and 44 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1300) [10:15:07] awesome [10:15:12] (03CR) 10Ladsgroup: [C: 03+2] Bump portal to HEAD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816721 (owner: 10Ladsgroup) [10:17:09] (03Merged) 10jenkins-bot: Bump portal to HEAD [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816721 (owner: 10Ladsgroup) [10:19:29] (03CR) 10Elukey: [C: 03+2] ml-services: update Docker images to KServe 0.8 in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/816716 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [10:20:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31834 and previous config saved to /var/cache/conftool/dbconfig/20220725-102043-ladsgroup.json [10:21:35] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [10:22:05] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [10:22:28] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [10:23:40] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [10:23:44] !log ladsgroup@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:816721|Fixing favicon of wikiquote and wikibooks]] (duration: 03m 03s) [10:24:11] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [10:24:27] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [10:24:45] !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [10:26:39] !log ladsgroup@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:816721|Fixing favicon of wikiquote and wikibooks]] (duration: 02m 55s) [10:27:16] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [10:27:17] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [10:28:17] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [10:30:57] PROBLEM - Check for snapshots leaked by cinder backup agent on cloudcontrol1004 is CRITICAL: 12 snaps in the admin project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_snapshots_leaked_by_cinder_backup_agent [10:31:03] PROBLEM - Check for snapshots leaked by cinder backup agent on cloudcontrol1007 is CRITICAL: 12 snaps in the admin project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_snapshots_leaked_by_cinder_backup_agent [10:31:03] PROBLEM - Check for snapshots leaked by cinder backup agent on cloudcontrol1006 is CRITICAL: 12 snaps in the admin project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_snapshots_leaked_by_cinder_backup_agent [10:32:13] PROBLEM - Check for snapshots leaked by cinder backup agent on cloudcontrol1005 is CRITICAL: 13 snaps in the admin project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_snapshots_leaked_by_cinder_backup_agent [10:32:19] PROBLEM - Check for snapshots leaked by cinder backup agent on cloudcontrol1003 is CRITICAL: 13 snaps in the admin project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_snapshots_leaked_by_cinder_backup_agent [10:35:49] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31837 and previous config saved to /var/cache/conftool/dbconfig/20220725-103549-ladsgroup.json [10:40:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31841 and previous config saved to /var/cache/conftool/dbconfig/20220725-104013-ladsgroup.json [10:40:17] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [10:40:39] (03PS1) 10Jbond: C:ssh::client: Drop select from known_hosts file [puppet] - 10https://gerrit.wikimedia.org/r/816724 [10:40:41] (03PS1) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [10:44:45] (JobUnavailable) firing: (2) Reduced availability for job php-fpm in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:45:24] (03CR) 10CI reject: [V: 04-1] P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [10:46:19] (03PS2) 10Jbond: C:ssh::client: Drop select from known_hosts file [puppet] - 10https://gerrit.wikimedia.org/r/816724 [10:46:32] (03PS2) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [10:49:36] (03CR) 10CI reject: [V: 04-1] P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [10:49:45] (JobUnavailable) firing: (2) Reduced availability for job php-fpm in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:50:45] (03PS1) 10MVernon: swift: stop flinging thumbnails at other DC in rewrite.py [puppet] - 10https://gerrit.wikimedia.org/r/816726 (https://phabricator.wikimedia.org/T313102) [10:50:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312863)', diff saved to https://phabricator.wikimedia.org/P31845 and previous config saved to /var/cache/conftool/dbconfig/20220725-105054-ladsgroup.json [10:50:56] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance [10:51:01] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [10:51:09] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance [10:51:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31846 and previous config saved to /var/cache/conftool/dbconfig/20220725-105114-ladsgroup.json [10:54:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31848 and previous config saved to /var/cache/conftool/dbconfig/20220725-105403-ladsgroup.json [10:54:45] (JobUnavailable) resolved: (2) Reduced availability for job php-fpm in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:55:18] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31850 and previous config saved to /var/cache/conftool/dbconfig/20220725-105518-ladsgroup.json [11:06:15] PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [11:07:29] 10SRE, 10Wikimedia-Mailing-lists: Volunteer account erroneously linked with official email id - https://phabricator.wikimedia.org/T313321 (10RASharma_WMF) Hi, Sorry for the late response. The "User" is my volunteer account whereas I want it linked to my official username RASharma (WMF). I may have, when I was... [11:09:09] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31851 and previous config saved to /var/cache/conftool/dbconfig/20220725-110908-ladsgroup.json [11:10:19] (03PS2) 10Ayounsi: [WIP] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 [11:10:23] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31852 and previous config saved to /var/cache/conftool/dbconfig/20220725-111023-ladsgroup.json [11:12:01] (03PS3) 10Ayounsi: sre.network.debug: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/812380 [11:12:03] (03PS1) 10Ayounsi: sre.network.peering: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/816730 [11:16:09] (03CR) 10CI reject: [V: 04-1] sre.network.peering: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/816730 (owner: 10Ayounsi) [11:16:33] (03CR) 10CI reject: [V: 04-1] [WIP] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 (owner: 10Ayounsi) [11:24:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31853 and previous config saved to /var/cache/conftool/dbconfig/20220725-112413-ladsgroup.json [11:25:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312863)', diff saved to https://phabricator.wikimedia.org/P31854 and previous config saved to /var/cache/conftool/dbconfig/20220725-112528-ladsgroup.json [11:25:33] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [11:29:50] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Volans) p:05Triage→03Medium [11:36:43] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Volans) [11:36:56] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Volans) I think that you want the `restricted` group: ` description: access to mwmaint hosts, mwlog hosts (private data) and bastion hosts restricted folks use sudo to... [11:39:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31855 and previous config saved to /var/cache/conftool/dbconfig/20220725-113919-ladsgroup.json [11:39:20] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance [11:39:24] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [11:39:34] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance [11:39:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31856 and previous config saved to /var/cache/conftool/dbconfig/20220725-113939-ladsgroup.json [11:43:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31857 and previous config saved to /var/cache/conftool/dbconfig/20220725-114324-ladsgroup.json [11:43:41] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/816726 (https://phabricator.wikimedia.org/T313102) (owner: 10MVernon) [11:49:14] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10mfossati) Thanks for your action, @Volans ! @Cparle , can you confirm that [restricted](https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/admin/d... [11:58:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31858 and previous config saved to /var/cache/conftool/dbconfig/20220725-115829-ladsgroup.json [12:13:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31859 and previous config saved to /var/cache/conftool/dbconfig/20220725-121334-ladsgroup.json [12:14:52] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] services_proxy: Allow having both v4 and v6 AF enabled [puppet] - 10https://gerrit.wikimedia.org/r/815958 (owner: 10Alexandros Kosiaris) [12:14:58] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/815958 (owner: 10Alexandros Kosiaris) [12:15:11] (03PS6) 10Alexandros Kosiaris: services_proxy: Allow having both v4 and v6 AF enabled [puppet] - 10https://gerrit.wikimedia.org/r/815958 [12:15:33] (03CR) 10Alexandros Kosiaris: [V: 03+2] services_proxy: Allow having both v4 and v6 AF enabled [puppet] - 10https://gerrit.wikimedia.org/r/815958 (owner: 10Alexandros Kosiaris) [12:17:44] (03CR) 10Klausman: [C: 03+2] ml-services: Add ar, cs & en wiki articletopic isvcs to prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/816720 (https://phabricator.wikimedia.org/T313307) (owner: 10Kevin Bazira) [12:22:21] (03Merged) 10jenkins-bot: ml-services: Add ar, cs & en wiki articletopic isvcs to prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/816720 (https://phabricator.wikimedia.org/T313307) (owner: 10Kevin Bazira) [12:23:03] (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:28:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:28:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312863)', diff saved to https://phabricator.wikimedia.org/P31860 and previous config saved to /var/cache/conftool/dbconfig/20220725-122839-ladsgroup.json [12:28:41] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance [12:28:45] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [12:28:55] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance [12:29:35] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance [12:29:49] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance [12:29:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31861 and previous config saved to /var/cache/conftool/dbconfig/20220725-122953-ladsgroup.json [12:34:37] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31862 and previous config saved to /var/cache/conftool/dbconfig/20220725-123436-ladsgroup.json [12:34:41] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [12:34:57] (03PS1) 10Alexandros Kosiaris: Revert "services_proxy: Allow having both v4 and v6 AF enabled" [puppet] - 10https://gerrit.wikimedia.org/r/816338 [12:35:05] (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "services_proxy: Allow having both v4 and v6 AF enabled" [puppet] - 10https://gerrit.wikimedia.org/r/816338 (owner: 10Alexandros Kosiaris) [12:35:09] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Revert "services_proxy: Allow having both v4 and v6 AF enabled" [puppet] - 10https://gerrit.wikimedia.org/r/816338 (owner: 10Alexandros Kosiaris) [12:43:14] 10SRE-OnFire, 10observability, 10SRE Observability (FY2022/2023-Q1): Business hours oncall implementation delays pages to batphone by 5 minutes when there are no oncallers - https://phabricator.wikimedia.org/T313603 (10CDanis) >>! In T313603#8100527, @Volans wrote: > Are the people oncall supposed to switch... [12:49:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31863 and previous config saved to /var/cache/conftool/dbconfig/20220725-124942-ladsgroup.json [12:58:17] (03CR) 10Elukey: [C: 03+2] kserve: upgrade to upstream release 0.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/815691 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [13:00:05] RoanKattouw, Lucas_WMDE, Urbanecm, and awight: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1300). [13:00:05] koi and dbrant: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:01:51] (03PS2) 10Filippo Giunchedi: prometheus: update blackbox check alerts runbook link [puppet] - 10https://gerrit.wikimedia.org/r/816135 (https://phabricator.wikimedia.org/T312947) [13:01:53] (03PS11) 10Filippo Giunchedi: DO-NOT-SUBMIT(Under review and discussion): [puppet] - 10https://gerrit.wikimedia.org/r/810146 (https://phabricator.wikimedia.org/T311457) (owner: 10Mary Yang) [13:02:12] !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [13:02:36] !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [13:03:56] 10SRE, 10Discovery, 10wmde-team-b-tech, 10Data Engineering Planning (Sprint 01): archiva1002 is running low on space left in the root partition - https://phabricator.wikimedia.org/T313386 (10hashar) @BTullis very well done, thank you very much! :) [13:04:03] (03PS11) 10Jbond: C:varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723 [13:04:45] RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:04:47] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31864 and previous config saved to /var/cache/conftool/dbconfig/20220725-130447-ladsgroup.json [13:07:25] (03CR) 10Filippo Giunchedi: "Hi Mary," [puppet] - 10https://gerrit.wikimedia.org/r/810146 (https://phabricator.wikimedia.org/T311457) (owner: 10Mary Yang) [13:07:43] Hi, is there anyone could deploy today? [13:13:36] I can deploy [13:14:44] (once my yubikey works…) [13:15:01] :D [13:15:09] (03CR) 10Jbond: deployment_server: add gerrit host key for mwpresync pushing to gerrit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/816221 (https://phabricator.wikimedia.org/T303857) (owner: 10Dzahn) [13:16:02] I’m skipping the zh transwiki permission change, I think I agree with Amir that community consensus hasn’t been reached [13:16:22] (which doesn’t mean the change has to be blocked forever, but I don’t want to discuss it in the remaining time in the window) [13:16:32] (03CR) 10Jbond: "i think the following is probably the way to go" [puppet] - 10https://gerrit.wikimedia.org/r/816221 (https://phabricator.wikimedia.org/T303857) (owner: 10Dzahn) [13:16:41] !log set min_part_hours to 12 for codfw swift on ms-fe2009 T312643 [13:16:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:47] T312643: Adjust ms ring min_part_hours to 12 hours - https://phabricator.wikimedia.org/T312643 [13:17:42] ruwiki discussion looks like consensus was reached there [13:18:15] (03PS2) 10Lucas Werkmeister (WMDE): ruwikivoyage: Add "suppressredirect" right to "filemover" group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816242 (https://phabricator.wikimedia.org/T313614) (owner: 10Stang) [13:18:54] 10SRE, 10Ganeti, 10Infrastructure-Foundations, 10Patch-For-Review: Create a cookbook to switch an instance to DRBD/plain disk storage - https://phabricator.wikimedia.org/T312116 (10Volans) p:05Triage→03Medium [13:19:31] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] ruwikivoyage: Add "suppressredirect" right to "filemover" group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816242 (https://phabricator.wikimedia.org/T313614) (owner: 10Stang) [13:19:52] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312863)', diff saved to https://phabricator.wikimedia.org/P31870 and previous config saved to /var/cache/conftool/dbconfig/20220725-131952-ladsgroup.json [13:19:54] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance [13:19:57] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [13:20:07] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance [13:20:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31871 and previous config saved to /var/cache/conftool/dbconfig/20220725-132012-ladsgroup.json [13:20:59] !log set min_part_hours to 12 for eqiad swift on ms-fe1009 T312643 [13:21:01] hrm, gate-and-submit for the config change hasn’t even started yet [13:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:02] (just queued) [13:21:08] is that not higher priority than normal CI? [13:21:14] (now it’s running) [13:21:38] (03Merged) 10jenkins-bot: ruwikivoyage: Add "suppressredirect" right to "filemover" group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816242 (https://phabricator.wikimedia.org/T313614) (owner: 10Stang) [13:21:50] 10SRE-swift-storage: Adjust ms ring min_part_hours to 12 hours - https://phabricator.wikimedia.org/T312643 (10MatthewVernon) 05Open→03Resolved Rings adjusted. [13:22:06] koi: the ruwikivoyage change should be on mwdebug1001, can you test it? [13:23:04] Lucas_WMDE: LGTM [13:24:28] alright, thanks [13:25:14] (03PS12) 10Jbond: C:varnish: Rate limit hotlinking [puppet] - 10https://gerrit.wikimedia.org/r/768723 [13:25:38] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:26:41] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:26:42] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:27:00] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31872 and previous config saved to /var/cache/conftool/dbconfig/20220725-132700-ladsgroup.json [13:27:04] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [13:27:08] https://phabricator.wikimedia.org/T280326#7090625 indicates ptwiki WikiLove also requires creating database tables [13:27:26] I assume I’d really want to do this *before* syncing the config change to enable the extension? [13:27:43] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:28:01] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816242|ruwikivoyage: Add "suppressredirect" right to "filemover" group (T313614)]] (duration: 03m 17s) [13:28:06] T313614: Russian Wikivoyage needs ''suppressredirect'' right for ''filemover'' group - https://phabricator.wikimedia.org/T313614 [13:28:32] yeah, at https://phabricator.wikimedia.org/T280326#7086856 it was done first [13:28:34] then I’ll do that [13:28:51] (03PS2) 10Lucas Werkmeister (WMDE): ptwikinews: Install WikiLove extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816316 (https://phabricator.wikimedia.org/T313173) (owner: 10Stang) [13:30:09] (03CR) 10Vgutierrez: C:varnish: Rate limit hotlinking (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/768723 (owner: 10Jbond) [13:31:04] sorry, pasted the wrong link – https://phabricator.wikimedia.org/T266744#6589452 is where createExtensionTables.php ran first [13:31:42] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] ptwikinews: Install WikiLove extension (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816316 (https://phabricator.wikimedia.org/T313173) (owner: 10Stang) [13:31:58] !log lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php ptwikinews wikilove # T313173 [13:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:04] T313173: add WikiLove extension in ptwikinews - https://phabricator.wikimedia.org/T313173 [13:32:54] (03Merged) 10jenkins-bot: ptwikinews: Install WikiLove extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816316 (https://phabricator.wikimedia.org/T313173) (owner: 10Stang) [13:33:17] koi: ptwikinews wikilove change should be on mwdebug1001, please test [13:35:35] hm, when I open the WikiLove interface, it looks like some icons aren’t loading properly [13:35:44] (03PS1) 10Elukey: kserve: add service account to StatefulSet [deployment-charts] - 10https://gerrit.wikimedia.org/r/816762 (https://phabricator.wikimedia.org/T311982) [13:35:46] `` [13:36:40] I wonder if that’s just due to mwdebug [13:37:03] I could see the same issue; don't know [13:37:48] should I try deploying it? [13:37:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:38:07] we can still revert the config change if needed [13:38:23] I don’t think it would be catastrophic if the broken icons were temporarily visible to all ptwikinews visitors [13:38:32] please try it, and yeah, if it still exist a revert is needed [13:38:35] ok [13:38:48] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:38:49] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:39:43] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:40:48] (03PS3) 10Giuseppe Lavagetto: jobrunner: allow selecting explicitly the backend when performing health checks. [puppet] - 10https://gerrit.wikimedia.org/r/810348 (https://phabricator.wikimedia.org/T311386) [13:42:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31873 and previous config saved to /var/cache/conftool/dbconfig/20220725-134205-ladsgroup.json [13:42:14] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816316|ptwikinews: Install WikiLove extension (T313173)]] (duration: 03m 19s) [13:42:17] T313173: add WikiLove extension in ptwikinews - https://phabricator.wikimedia.org/T313173 [13:42:39] nope, icons still look broken :/ [13:43:17] <_joe_> Lucas_WMDE: yeah mwdebug is really identical to the rest of production in all relevant ways [13:43:28] (03CR) 10CI reject: [V: 04-1] jobrunner: allow selecting explicitly the backend when performing health checks. [puppet] - 10https://gerrit.wikimedia.org/r/810348 (https://phabricator.wikimedia.org/T311386) (owner: 10Giuseppe Lavagetto) [13:43:49] Lucas_WMDE: I don't see the issue on enwiki. Why would it change across versions? [13:44:14] I use safemode=1 and seems the icon is shown, so maybe something in Common.js break this? [13:44:28] oh, good point [13:44:37] (03CR) 10Ayounsi: "Out of curiosity, why don't we have results for "eqiad, codfw and ulsfo"?" [dns] - 10https://gerrit.wikimedia.org/r/816053 (https://phabricator.wikimedia.org/T311472) (owner: 10BCornwall) [13:44:41] indeed, in safemode it works (and the list of types is different) [13:46:20] (03PS4) 10Giuseppe Lavagetto: jobrunner: allow selecting explicitly the backend when performing health checks. [puppet] - 10https://gerrit.wikimedia.org/r/810348 (https://phabricator.wikimedia.org/T311386) [13:46:41] this, I guess https://pt.wikinews.org/wiki/MediaWiki:WikiLove.js [13:46:50] which resets the `$.wikiLoveOptions` [13:49:31] left a note at task, anyway for installing extension it's done [13:49:33] I think we can leave that to the interface admins on the wiki [13:49:37] and don’t need to revert [13:49:54] (don’t ask me why they tried to configure the extension before it was installed, when they couldn’t possibly test it…) [13:50:04] yeah :) [13:50:14] alright, dbrant still there? :) [13:50:32] present [13:50:49] (03PS6) 10Lucas Werkmeister (WMDE): Add sampling to android.breadcrumbs event stream. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811765 (https://phabricator.wikimedia.org/T310847) (owner: 10Dbrant) [13:50:50] alright [13:51:09] oh dear, long diffConfig output ^^ [13:52:08] (03CR) 10Giuseppe Lavagetto: [C: 03+2] jobrunner: allow selecting explicitly the backend when performing health checks. [puppet] - 10https://gerrit.wikimedia.org/r/810348 (https://phabricator.wikimedia.org/T311386) (owner: 10Giuseppe Lavagetto) [13:52:36] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add sampling to android.breadcrumbs event stream. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811765 (https://phabricator.wikimedia.org/T310847) (owner: 10Dbrant) [13:52:53] dbrant: is this change even testable on mwdebug? [13:53:30] (not a problem if it isn’t, just wondering if it even makes sense to pull it there) [13:53:51] hm, not actually sure, but probably not. [13:53:59] (03Merged) 10jenkins-bot: Add sampling to android.breadcrumbs event stream. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/811765 (https://phabricator.wikimedia.org/T310847) (owner: 10Dbrant) [13:54:50] alright, then I’ll just sync it [13:57:11] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31874 and previous config saved to /var/cache/conftool/dbconfig/20220725-135710-ladsgroup.json [13:57:43] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:57:57] hm, some “error trying to release the lock” messages from scap [13:58:01] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811765|Add sampling to android.breadcrumbs event stream. (T310847)]] (duration: 02m 56s) [13:58:05] T310847: Validate android_breadcrumbs_event data - https://phabricator.wikimedia.org/T310847 [13:58:09] on mw1320 [13:58:17] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [13:59:18] (03PS1) 10Giuseppe Lavagetto: httpbb: also add the new jobrunner tests in production [puppet] - 10https://gerrit.wikimedia.org/r/816786 [13:59:27] !log lucaswerkmeister-wmde@mw1320:~$ scap pull # T310847 (repeat failed host from earlier sync) [13:59:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:01] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [14:00:06] (03PS3) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [14:00:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [14:00:57] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [14:01:00] !log lucaswerkmeister-wmde@mw1320:~$ sudo -i /usr/local/sbin/restart-php7.2-fpm # T310847 just in case [14:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:18] I think we’re done? [14:01:30] Lucas_WMDE: confirmed! Many thanks. [14:01:33] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36364/console" [puppet] - 10https://gerrit.wikimedia.org/r/816786 (owner: 10Giuseppe Lavagetto) [14:01:34] yay [14:01:43] !log UTC afternoon backport+config window done [14:01:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:55] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [14:02:00] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] httpbb: also add the new jobrunner tests in production [puppet] - 10https://gerrit.wikimedia.org/r/816786 (owner: 10Giuseppe Lavagetto) [14:02:04] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36363/console" [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [14:03:12] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31875 and previous config saved to /var/cache/conftool/dbconfig/20220725-140311-ladsgroup.json [14:03:13] PROBLEM - Check systemd state on wtp1040 is CRITICAL: CRITICAL - degraded: The following units failed: php7.2-fpm_check_restart.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:03:16] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [14:04:36] (03PS1) 10Giuseppe Lavagetto: httpbb: fix test file name [puppet] - 10https://gerrit.wikimedia.org/r/816787 [14:04:49] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] httpbb: fix test file name [puppet] - 10https://gerrit.wikimedia.org/r/816787 (owner: 10Giuseppe Lavagetto) [14:04:57] (03CR) 10Eevans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/816719 (https://phabricator.wikimedia.org/T309896) (owner: 10MVernon) [14:05:12] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36365/console" [puppet] - 10https://gerrit.wikimedia.org/r/816724 (owner: 10Jbond) [14:10:49] RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:12:12] !log updating wikitech-static to MediaWiki 1.38.2 [14:12:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312863)', diff saved to https://phabricator.wikimedia.org/P31876 and previous config saved to /var/cache/conftool/dbconfig/20220725-141215-ladsgroup.json [14:12:17] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance [14:12:21] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [14:12:31] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance [14:12:36] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31877 and previous config saved to /var/cache/conftool/dbconfig/20220725-141236-ladsgroup.json [14:15:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31878 and previous config saved to /var/cache/conftool/dbconfig/20220725-141523-ladsgroup.json [14:16:25] (03CR) 10MVernon: [C: 03+2] hieradata: make sessionstore2001 a 3.11.13 canary [puppet] - 10https://gerrit.wikimedia.org/r/816719 (https://phabricator.wikimedia.org/T309896) (owner: 10MVernon) [14:18:17] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31879 and previous config saved to /var/cache/conftool/dbconfig/20220725-141816-ladsgroup.json [14:23:53] (03CR) 10Jbond: [V: 03+1 C: 03+2] C:ssh::client: Drop select from known_hosts file [puppet] - 10https://gerrit.wikimedia.org/r/816724 (owner: 10Jbond) [14:30:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31880 and previous config saved to /var/cache/conftool/dbconfig/20220725-143029-ladsgroup.json [14:32:44] (03CR) 10Volans: [C: 04-1] "LGTM but is not actually using the new parameters." [software/debmonitor] - 10https://gerrit.wikimedia.org/r/812556 (owner: 10Jbond) [14:33:22] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31881 and previous config saved to /var/cache/conftool/dbconfig/20220725-143321-ladsgroup.json [14:35:47] (03PS1) 10Elukey: kserve: apply upstream fix to storage-initializer [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/816792 (https://phabricator.wikimedia.org/T311982) [14:38:07] !log mvernon@cumin2002 START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002 [14:38:11] T309896: Upgrade Cassandra to latest 3.x (3.11.13) - https://phabricator.wikimedia.org/T309896 [14:38:44] (03PS4) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [14:41:46] (03CR) 10CI reject: [V: 04-1] P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [14:42:26] (03PS1) 10Ssingh: trafficserver: 9.x upgrade: do not enable ATS 9.x by default [puppet] - 10https://gerrit.wikimedia.org/r/816795 (https://phabricator.wikimedia.org/T309651) [14:42:38] (03PS5) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [14:44:19] !log mvernon@cumin2002 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: restart cassandra on 3.11.13 canary T309896 - mvernon@cumin2002 [14:44:21] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36367/console" [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [14:44:23] T309896: Upgrade Cassandra to latest 3.x (3.11.13) - https://phabricator.wikimedia.org/T309896 [14:45:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31882 and previous config saved to /var/cache/conftool/dbconfig/20220725-144534-ladsgroup.json [14:48:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312863)', diff saved to https://phabricator.wikimedia.org/P31883 and previous config saved to /var/cache/conftool/dbconfig/20220725-144827-ladsgroup.json [14:48:31] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [14:51:13] 10SRE-OnFire, 10observability, 10SRE Observability (FY2022/2023-Q1): Business hours oncall implementation delays pages to batphone by 5 minutes when there are no oncallers - https://phabricator.wikimedia.org/T313603 (10CDanis) @SLyngshede-WMF had suggested to perhaps use the VictorOps API to automate the swi... [15:00:02] 10SRE, 10Wikimedia-Mailing-lists: Volunteer account erroneously linked with official email id - https://phabricator.wikimedia.org/T313321 (10Aklapper) 05Stalled→03Invalid Hi, there is no way to "enter a volunteer account" on lists.wikimedia.org, only an email address and some username. I think there is som... [15:00:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312863)', diff saved to https://phabricator.wikimedia.org/P31884 and previous config saved to /var/cache/conftool/dbconfig/20220725-150039-ladsgroup.json [15:00:41] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [15:00:45] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [15:00:55] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [15:01:37] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance [15:01:51] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance [15:01:52] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [15:02:08] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [15:02:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31885 and previous config saved to /var/cache/conftool/dbconfig/20220725-150212-ladsgroup.json [15:04:57] (03CR) 10BBlack: "Brett: one more" [dns] - 10https://gerrit.wikimedia.org/r/816028 (https://phabricator.wikimedia.org/T311472) (owner: 10BCornwall) [15:10:51] (03PS1) 10Ssingh: aptrepo: add a component for ATS 9.x [puppet] - 10https://gerrit.wikimedia.org/r/816801 (https://phabricator.wikimedia.org/T309651) [15:14:01] (03Abandoned) 10Ebernhardson: Remove references to ApiFeatureUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814871 (https://phabricator.wikimedia.org/T313248) (owner: 10Ebernhardson) [15:14:25] (03CR) 10Klausman: [C: 03+1] kserve: apply upstream fix to storage-initializer [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/816792 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [15:19:57] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31886 and previous config saved to /var/cache/conftool/dbconfig/20220725-151957-ladsgroup.json [15:20:02] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [15:21:58] (03CR) 10BCornwall: geodns: Map out African countries by DC latency (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/816028 (https://phabricator.wikimedia.org/T311472) (owner: 10BCornwall) [15:29:34] (03CR) 10Elukey: [V: 03+2 C: 03+2] kserve: apply upstream fix to storage-initializer [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/816792 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [15:30:05] jan_drewniak: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimedia Portals Update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1530). [15:30:28] (03PS1) 10Eevans: Do not assign CASSANDRA_LOG_DIR from environment config [puppet] - 10https://gerrit.wikimedia.org/r/816805 (https://phabricator.wikimedia.org/T309896) [15:30:49] (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/816805 (https://phabricator.wikimedia.org/T309896) (owner: 10Eevans) [15:30:56] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row B new PDUs - https://phabricator.wikimedia.org/T310070 (10Papaul) [15:31:10] (03PS1) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) [15:31:40] (03PS6) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [15:32:05] (03PS7) 10Jbond: P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 [15:32:11] (03CR) 10CI reject: [V: 04-1] P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [15:33:12] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36368/console" [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [15:33:15] (03CR) 10MVernon: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/816805 (https://phabricator.wikimedia.org/T309896) (owner: 10Eevans) [15:33:29] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row C new PDUs - https://phabricator.wikimedia.org/T310145 (10Papaul) [15:34:30] (03PS1) 10Elukey: kserve: update storage-initializer's docker image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/816807 (https://phabricator.wikimedia.org/T311982) [15:34:54] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row D new PDUs - https://phabricator.wikimedia.org/T310146 (10Papaul) [15:34:58] (03PS13) 10BBlack: esitest service for cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/793561 (https://phabricator.wikimedia.org/T308799) [15:35:00] (03PS2) 10BBlack: trafficserver: Add ESI testing remap rule [puppet] - 10https://gerrit.wikimedia.org/r/810030 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:35:02] (03PS2) 10BBlack: varnish: Enable ESI for /esitest-fa8a495983347898/includer [puppet] - 10https://gerrit.wikimedia.org/r/810044 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:35:03] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31887 and previous config saved to /var/cache/conftool/dbconfig/20220725-153502-ladsgroup.json [15:35:04] (03PS1) 10BBlack: cache::text - include esitest service [puppet] - 10https://gerrit.wikimedia.org/r/816808 (https://phabricator.wikimedia.org/T308799) [15:35:12] (03CR) 10Elukey: [C: 03+2] kserve: add service account to StatefulSet [deployment-charts] - 10https://gerrit.wikimedia.org/r/816762 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [15:35:36] 10SRE, 10ops-codfw: codfw: Master PDU rack/setup row A, row B, rowC and row D task - https://phabricator.wikimedia.org/T309956 (10Papaul) [15:35:38] (03CR) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [15:36:00] (03CR) 10Eevans: [C: 03+1] "PPC looks as expected." [puppet] - 10https://gerrit.wikimedia.org/r/816805 (https://phabricator.wikimedia.org/T309896) (owner: 10Eevans) [15:38:20] (03CR) 10BBlack: esitest service for cache nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/793561 (https://phabricator.wikimedia.org/T308799) (owner: 10BBlack) [15:38:55] (03PS14) 10BBlack: esitest service for cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/793561 (https://phabricator.wikimedia.org/T308799) [15:38:57] (03PS2) 10BBlack: cache::text - include esitest service [puppet] - 10https://gerrit.wikimedia.org/r/816808 (https://phabricator.wikimedia.org/T308799) [15:38:59] (03PS3) 10BBlack: trafficserver: Add ESI testing remap rule [puppet] - 10https://gerrit.wikimedia.org/r/810030 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:39:01] (03PS3) 10BBlack: varnish: Enable ESI for /esitest-fa8a495983347898/includer [puppet] - 10https://gerrit.wikimedia.org/r/810044 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:41:19] (03CR) 10Jbond: [C: 03+2] P:ssh::client: use more morden functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816725 (owner: 10Jbond) [15:41:43] (03CR) 10Giuseppe Lavagetto: [C: 03+1] varnish: If X-Requestctl is unset, don't append it to X-Analytics [puppet] - 10https://gerrit.wikimedia.org/r/816000 (owner: 10RLazarus) [15:46:11] (03PS2) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) [15:47:00] (03CR) 10CI reject: [V: 04-1] trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [15:48:01] (03PS3) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) [15:49:15] (03CR) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [15:50:08] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31888 and previous config saved to /var/cache/conftool/dbconfig/20220725-155007-ladsgroup.json [15:50:32] 10SRE, 10ops-codfw, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Degraded RAID on elastic2049 - https://phabricator.wikimedia.org/T311939 (10MPhamWMF) [15:50:45] (03PS2) 10Jbond: P:gerrit: Export sshkey for gerrit shared services [puppet] - 10https://gerrit.wikimedia.org/r/816715 (https://phabricator.wikimedia.org/T303857) [15:52:51] (03PS4) 10BBlack: varnish: Enable ESI for /esitest-fa8a495983347898/includer [puppet] - 10https://gerrit.wikimedia.org/r/810044 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:54:06] (03CR) 10Ori: "This is cherry-picked on the Beta Cluster and WAI." [puppet] - 10https://gerrit.wikimedia.org/r/816206 (https://phabricator.wikimedia.org/T138093) (owner: 10Ori) [15:57:11] (03CR) 10BBlack: "PCC diff for a text node, for the whole 4-patch set:" [puppet] - 10https://gerrit.wikimedia.org/r/810044 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [15:57:19] (03PS4) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) [15:58:19] (03CR) 10BBlack: [C: 03+2] esitest service for cache nodes [puppet] - 10https://gerrit.wikimedia.org/r/793561 (https://phabricator.wikimedia.org/T308799) (owner: 10BBlack) [15:58:27] (03CR) 10BBlack: [C: 03+2] cache::text - include esitest service [puppet] - 10https://gerrit.wikimedia.org/r/816808 (https://phabricator.wikimedia.org/T308799) (owner: 10BBlack) [15:58:38] (03CR) 10CI reject: [V: 04-1] trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [15:58:47] (03PS1) 10Alexandros Kosiaris: build_envoy_config: Allow data to be a list [puppet] - 10https://gerrit.wikimedia.org/r/816810 [15:59:32] !log cp*: temporarily disable puppet to test esitest service rollout [15:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:38] (03PS1) 10Ladsgroup: Stop writing to the old templatelinks field in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816811 (https://phabricator.wikimedia.org/T312865) [16:00:43] (03PS5) 10Ssingh: trafficserver: 9.x upgrade: install ATS 9.x from component [puppet] - 10https://gerrit.wikimedia.org/r/816806 (https://phabricator.wikimedia.org/T309651) [16:05:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312863)', diff saved to https://phabricator.wikimedia.org/P31890 and previous config saved to /var/cache/conftool/dbconfig/20220725-160512-ladsgroup.json [16:05:14] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance [16:05:18] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [16:05:28] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance [16:05:33] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31891 and previous config saved to /var/cache/conftool/dbconfig/20220725-160532-ladsgroup.json [16:05:46] (03CR) 10Ebernhardson: [C: 03+1] [cirrus] Increase shard count for ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816706 (owner: 10DCausse) [16:07:59] (03CR) 10RLazarus: [C: 03+2] varnish: If X-Requestctl is unset, don't append it to X-Analytics [puppet] - 10https://gerrit.wikimedia.org/r/816000 (owner: 10RLazarus) [16:08:36] 10SRE, 10SRE-swift-storage, 10Performance-Team, 10Traffic, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) I see! Do we need to employ any of these strategies, then? What (if anything) should be done before we flip this on for prod? [16:09:25] PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:10:11] (03CR) 10BBlack: [C: 03+2] trafficserver: Add ESI testing remap rule [puppet] - 10https://gerrit.wikimedia.org/r/810030 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [16:10:25] (03CR) 10BBlack: [C: 03+2] varnish: Enable ESI for /esitest-fa8a495983347898/includer [puppet] - 10https://gerrit.wikimedia.org/r/810044 (https://phabricator.wikimedia.org/T308799) (owner: 10Vgutierrez) [16:14:16] !log cp*: re-enable puppet for normal staggered rollout (cp4027 tested all the esitest stuff without incident) [16:14:17] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31892 and previous config saved to /var/cache/conftool/dbconfig/20220725-161416-ladsgroup.json [16:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:25] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [16:14:27] (03CR) 10Ladsgroup: [C: 04-1] Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816239 (https://phabricator.wikimedia.org/T313657) (owner: 10Stang) [16:14:45] (03PS3) 10Majavah: P:openstack::cinder: use new rabbitmq_hosts hiera var [puppet] - 10https://gerrit.wikimedia.org/r/815681 [16:14:47] (03PS2) 10Majavah: P:openstack::designate: use new rabbitmq_hosts hiera var [puppet] - 10https://gerrit.wikimedia.org/r/815683 [16:14:49] (03PS3) 10Majavah: P:openstack::trove: use new rabbitmq_hosts hiera var [puppet] - 10https://gerrit.wikimedia.org/r/815723 [16:14:51] (03PS1) 10Majavah: site: put cloudrabbit1001-3 into service [puppet] - 10https://gerrit.wikimedia.org/r/816818 [16:16:09] PROBLEM - puppetmaster backend https on puppetmaster1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:17:35] (03CR) 10Elukey: [C: 03+2] kserve: update storage-initializer's docker image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/816807 (https://phabricator.wikimedia.org/T311982) (owner: 10Elukey) [16:17:35] uh [16:18:07] PROBLEM - puppetmaster https on puppetmaster1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:18:33] RECOVERY - puppetmaster backend https on puppetmaster1001 is OK: HTTP OK: Status line output matched 400 - 414 bytes in 2.633 second response time https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:18:35] jbond: anything WIP on puppetmaster? [16:18:39] ^^^ [16:18:52] (03PS12) 10Ebernhardson: elastic: Restart masters one at a time after all others [software/spicerack] - 10https://gerrit.wikimedia.org/r/781009 (https://phabricator.wikimedia.org/T306389) [16:24:01] PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.03385 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [16:25:42] volans: I still see some issues running puppet on an host (in my case, deploy1002) [16:27:05] (03CR) 10Ssingh: "The default_value in backend.pp should already take care of this so this patch is not required. However, I will let Valentin abandon it as" [puppet] - 10https://gerrit.wikimedia.org/r/816795 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [16:29:01] (03Abandoned) 10Vgutierrez: trafficserver: 9.x upgrade: do not enable ATS 9.x by default [puppet] - 10https://gerrit.wikimedia.org/r/816795 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh) [16:29:22] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31893 and previous config saved to /var/cache/conftool/dbconfig/20220725-162921-ladsgroup.json [16:33:45] PROBLEM - Widespread puppet agent failures- no resources reported on alert1001 is CRITICAL: 0.01112 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [16:34:07] ^ ok this was already asked, just saw [16:35:14] looking [16:35:31] there are ~400 hosts with puppet disabled, looking [16:35:48] jbond: Could not evaluate: Could not retrieve file metadata for puppet:///modules/profile/puppet/ca.production.pem: Net::OpenTimeout [16:36:01] anything changed for the CA? [16:36:24] I had puppet disabled on cp* (~95 hosts) a while ago, but re-enabled like 20 minutes ago [16:36:33] volans: lookes like that Net:;OpenTimeout uissues happened with lots of randome resources [16:36:45] as such i would say some net issue on the puppet masters [16:37:02] earlier icinga failed for the http check on puppetmaster1001 [16:37:20] volans: how long ago? [16:37:34] ~20m [16:37:47] that aligns [16:38:21] but it seem still happening right? [16:38:26] yes [16:38:29] 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10Papaul) [16:39:34] jbond: there are tons of 404 in the backend logs [16:39:54] yeah [16:40:28] I have a batch-limited cumin agent run going on cp*, and it's failing agent runs ~1/3rd of the time so far randomly it seems [16:40:40] maybe closer to half, but not quite [16:40:53] (03PS2) 10Dduvall: jwt_authorizer: Provide microservice for JSON Web Token authorization [puppet] - 10https://gerrit.wikimedia.org/r/816018 (https://phabricator.wikimedia.org/T308501) [16:40:53] jbond: can we depool 1001 while debugging? [16:40:55] (03PS12) 10Dduvall: docker_registry_ha: Authorize GitLab trusted runners using JWT [puppet] - 10https://gerrit.wikimedia.org/r/793875 (https://phabricator.wikimedia.org/T308501) [16:41:02] (03CR) 10Dduvall: jwt_authorizer: Provide microservice for JSON Web Token authorization (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/816018 (https://phabricator.wikimedia.org/T308501) (owner: 10Dduvall) [16:41:16] volans: i think its better to just disable puppet while debugging [16:41:20] just from puppet fronend/backend, ofc not for volatile/CA [16:41:23] ok [16:41:53] PROBLEM - puppetmaster backend https on puppetmaster1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:43:34] tcp errors increased in the last hour or so: [16:43:35] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=puppetmaster1001&var-datasource=thanos&var-cluster=puppet&viewPanel=31 [16:44:23] RECOVERY - puppetmaster backend https on puppetmaster1001 is OK: HTTP OK: Status line output matched 400 - 414 bytes in 4.034 second response time https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:44:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31894 and previous config saved to /var/cache/conftool/dbconfig/20220725-164426-ladsgroup.json [16:45:33] (03CR) 10Mary Yang: DO-NOT-SUBMIT(Under review and discussion): (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/810146 (https://phabricator.wikimedia.org/T311457) (owner: 10Mary Yang) [16:48:57] !loif disable puppet fleet wide [16:49:02] !log disable puppet fleet wide [16:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:54] (03PS1) 10Jbond: Revert "P:ssh::client: use more modern functions for collecting sskey" [puppet] - 10https://gerrit.wikimedia.org/r/816772 [16:51:21] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 44.53 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6 [16:51:22] 10SRE, 10SRE-Access-Requests: Requesting access to private-data for Mikeraish (MRaishWMF) - https://phabricator.wikimedia.org/T313429 (10GEscalante-WMF) Go for it! [16:53:07] (03CR) 10Bernard Wang: [C: 03+1] "LGMT but it seems i cant +2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810405 (https://phabricator.wikimedia.org/T310527) (owner: 10Clare Ming) [16:53:55] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: (C)60 le (W)70 le 88.78 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/d/000000180/varnish-http-requests?orgId=1&viewPanel=6 [16:56:45] volans: fyi i have a working theory that the https://gerrit.wikimedia.org/r/q/5d992d026c49e8a945b3058ac665805c82e438b9 may have cuased the puppet compilation to take too many resources making it unable to answer all requests [16:56:47] RECOVERY - puppetmaster https on puppetmaster1001 is OK: HTTP OK: Status line output matched 400 - 414 bytes in 9.277 second response time https://wikitech.wikimedia.org/wiki/Puppet%23Debugging [16:56:57] disabling puppet its taking some time (likley waiting for things to stop) [16:57:11] once stoped ill do a bit more testing then revert the change and see if that solves things [16:57:14] thanks jbond [16:57:22] ack thanks a lot jbond [16:57:30] sorry I have to step out in 3 minutes [16:57:35] and ok that could make sense [16:58:09] sure no problem, who is on call as i also need to pop out (need to get a pipe to fix my hot water) [16:58:22] jbond: I am [16:58:29] happy to take over as soon as you are done [16:58:38] 10SRE, 10Editing-Team-Request, 10Editing-team, 10MediaWiki-extensions-Score, and 4 others: Reduce Lilypond shellouts from VisualEditor - https://phabricator.wikimedia.org/T312319 (10Ladsgroup) It seems it's deployed now. Shall we close this? [16:58:54] jouncebot: nowandnext [16:58:54] No deployments scheduled for the next 0 hour(s) and 1 minute(s) [16:58:54] In 0 hour(s) and 1 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1700) [16:58:58] sukhe: ill try and get things working before i go, worse case may need to leave puppet disabled while i pop out [16:59:09] will only take me a max 1 hour to do what i need to dop [16:59:20] sure, I am around and happy to take over as needed. please let me know! [16:59:30] sukhe: thanks will do :) [16:59:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312863)', diff saved to https://phabricator.wikimedia.org/P31895 and previous config saved to /var/cache/conftool/dbconfig/20220725-165931-ladsgroup.json [16:59:39] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [17:00:05] ryankemper: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1700). [17:02:42] (03CR) 10Jbond: [C: 03+2] Revert "P:ssh::client: use more modern functions for collecting sskey" [puppet] - 10https://gerrit.wikimedia.org/r/816772 (owner: 10Jbond) [17:06:51] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Seddon) @thcipriani just wanting to note there is a time pressure on this ticket and need for it by Wednesday [17:07:21] jouncebot: nowandnext [17:07:21] For the next 0 hour(s) and 22 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T1700) [17:07:21] In 2 hour(s) and 52 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T2000) [17:07:24] !log enable puppet fleet wide [17:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:41] ryankemper: are you deploying today? [17:07:52] Amir1: nope, coast is clear [17:08:20] awesome [17:08:33] (03CR) 10Ladsgroup: [C: 03+2] Stop writing to the old templatelinks field in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816811 (https://phabricator.wikimedia.org/T312865) (owner: 10Ladsgroup) [17:09:21] (03Merged) 10jenkins-bot: Stop writing to the old templatelinks field in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816811 (https://phabricator.wikimedia.org/T312865) (owner: 10Ladsgroup) [17:10:39] jbond: "enable puppet fleet wide"? is this going to undo anyone's manual disables? [17:11:19] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10RhinosF1) @Seddon: is this not waiting on a response to the above comment from @CParle [17:11:45] bblack: it shuoldn't, it should only re-enable things with my message "puppetmaster is strugeling - jbond" [17:11:45] ah ok [17:11:51] well [17:11:55] hopefully :0 [17:12:01] if it does its a bug which i will prioritse :) [17:12:28] I think the last time I looked at that stuff, if anyone runs "disable-puppet foo" on the fleet, it changes the message to foo on all the existing disables? maybe that's been fixed since, I donno [17:12:51] sukhe: fyi reverted the change i think cuased the issue and i have re-enabled puppet. things seems to be looking a better but it will take 30 mins for things to ramp up fully [17:13:14] thanks jbond! <3 [17:13:18] im going to pop out now and hopefully by the time im back we will see recovery [17:13:19] I will keep an eye out [17:13:22] I really wish upstream puppet had the concept of a set of disable-reasons. So persons could indepedently apply multiple reasons for disable, and re-enablement only happened when they're all explicitly removed. [17:13:37] like i said i shouldn;t be too long and will have phone and laptop [17:13:48] also the following seems to be a good indicate of the issue [17:13:49] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=puppetmaster1002&var-datasource=thanos&var-cluster=puppet&viewPanel=29&from=now-1h&to=now [17:14:25] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [17:15:25] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [17:15:26] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [17:16:26] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [17:17:01] (03PS2) 10Ori: varnish: enable query-sorting in production via X-Wikimedia-Debug [puppet] - 10https://gerrit.wikimedia.org/r/816206 (https://phabricator.wikimedia.org/T138093) [17:18:00] (03CR) 10CI reject: [V: 04-1] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 (owner: 10Ayounsi) [17:18:02] (03PS1) 10Ayounsi: Spicerack: add prettytable for peering cookbook [puppet] - 10https://gerrit.wikimedia.org/r/816824 [17:21:27] !log elastic: remove custom log levels from org.elasticsearch.deprecation [17:24:17] !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816811|Stop writing to the old templatelinks field in testwiki (T312865)]] (duration: 03m 11s) [17:27:55] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Cparle) Actually it's likely that @mfossati might need `deployment` while myself and @matthiasmullie are on leave (from Monday August 1), so if that includes access to the maint se... [17:42:22] (03PS4) 10Ayounsi: PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 [17:42:30] 10SRE, 10Editing-Team-Request, 10Editing-team, 10MediaWiki-extensions-Score, and 4 others: Reduce Lilypond shellouts from VisualEditor - https://phabricator.wikimedia.org/T312319 (10RLazarus) I'd like to redo @Legoktm's manual test first, and make sure I can't reproduce a spike -- I'll do that later today,... [17:43:07] RECOVERY - Widespread puppet agent failures- no resources reported on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.004352 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [17:43:41] RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.003868 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [17:47:01] ^ hopefully a sign of the good things to come this week :P [17:47:33] that's the spirit! :D [17:48:47] (03CR) 10CI reject: [V: 04-1] PeeringDB API: initial commit [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 (owner: 10Ayounsi) [17:57:01] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10RhinosF1) @CParle: thanks. @thcipriani is out of office this week. You might have to escalate further up the chain. [18:20:40] (03PS4) 10Ayounsi: sre.network.debug: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/812380 [18:24:02] (03CR) 10CI reject: [V: 04-1] sre.network.debug: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/812380 (owner: 10Ayounsi) [18:24:27] (03CR) 10Ayounsi: "This v1 is ready for review even though tests are missing." [software/spicerack] - 10https://gerrit.wikimedia.org/r/816701 (owner: 10Ayounsi) [18:47:06] (03PS1) 10Ssingh: dnsdist: Add default eBPF filter to dnsdist.conf [puppet] - 10https://gerrit.wikimedia.org/r/816834 [18:53:54] (03PS2) 10Ssingh: dnsdist: Add default eBPF filter to dnsdist.conf [puppet] - 10https://gerrit.wikimedia.org/r/816834 [18:59:38] (03PS1) 10Dzahn: gitlab: add reserved service IP 208.80.154.8, point to replica-new [dns] - 10https://gerrit.wikimedia.org/r/816835 (https://phabricator.wikimedia.org/T307142) [19:04:16] (03CR) 10Dzahn: "Arnold: For now this is an example where I want to show how I add the reverse records if it was "gerrit-replica-new"." [dns] - 10https://gerrit.wikimedia.org/r/816835 (https://phabricator.wikimedia.org/T307142) (owner: 10Dzahn) [19:12:11] sukhe: im gussing there has been no other fall out since the recovery? [19:12:29] jbond: everything seems OK so far, yep [19:12:34] thanks for fixing it :) [19:12:38] awseome and thanks for keeping an eye [19:12:39] 10SRE, 10ops-eqiad, 10decommission-hardware: decommission frdb1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T313607 (10Jgreen) >>! In T313607#8100547, @Volans wrote: > I've run the [[ https://wikitech.wikimedia.org/wiki/DNS/Netbox#Update_generated_records | sre.dns.netbox ]] as Icinga was alerti... [19:12:41] no probs [19:12:50] RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:23:27] !log [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.service [19:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:44] (03PS1) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [19:24:07] (03PS2) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [19:24:33] !log after new wikis have been created apparently they need a "initSiteStats.php" run to make statistics work but this only runs in a timer on mwmaint once weekly or so [19:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:24] initSiteStats.php runs for "arcwiki" then for "arwiki" and then it just ... stops? [19:28:01] (03PS3) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [19:29:01] mutante: exits or quiet while ebbing slow query on big wiki? [19:29:42] Krinkle: probably the latter. it went quiet after arwiki: Counting total edits...58732051 [19:29:52] mutante: btw is this step listed in the add wiki checklist? [19:30:02] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36373/console" [puppet] - 10https://gerrit.wikimedia.org/r/816775 (owner: 10Jbond) [19:30:25] jem pings me about it [19:30:42] if the checklist is the automatically created subtasks, I dont think it is [19:31:17] yea, the script switched to "azwiki" now.. all is normal [19:31:25] this is why it just runs weekly or so [19:32:01] (03PS4) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [19:32:06] No mention at https://wikitech.wikimedia.org/wiki/Add_a_wiki [19:32:09] could have started it manually for just one wiki instead of all but it's safer to only use the existing service [19:32:23] jem: ^ [19:33:03] (03PS5) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [19:33:52] Afaik core does this on its own. Not sure. Maybe retro counting big imports wont happen until the weekly cron indeed, but there shouldn't be any defect without it and it'll still increment for every edit afaik. If not, please file a bug and CC me:-) jem [19:35:10] Krinkle: a quote was " it's still all zeros except for 16 active users and 1 bot" [19:35:21] did it once before [19:35:40] Maybe we can fix the code :) [19:35:50] (03CR) 10Dduvall: "This attempt uses a python script to properly parse the TOML config files and merge them in much the same way that gitlab-runner does with" [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) (owner: 10Dduvall) [19:36:00] Please file a bug at least, we can figure out the details later [19:37:08] (03PS1) 10FNegri: Add fnegri to contactgroups.cfg [puppet] - 10https://gerrit.wikimedia.org/r/816837 (https://phabricator.wikimedia.org/T312597) [19:39:52] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10Seddon) @kchapman would you be able to sign off on this? [19:40:32] (03Abandoned) 10Dzahn: deployment_server: add gerrit host key for mwpresync pushing to gerrit [puppet] - 10https://gerrit.wikimedia.org/r/816221 (https://phabricator.wikimedia.org/T303857) (owner: 10Dzahn) [19:42:15] (03PS5) 10Ayounsi: sre.network.debug: initial commit [cookbooks] - 10https://gerrit.wikimedia.org/r/812380 [19:46:16] Hello and thanks, mutante [19:46:58] It's the second new Wikipedia where I have noted this, but I'm not checking new sister projects [19:49:15] I can open a bug in a few days, probably without technical details, but I hope you can fix that later [19:49:56] souds good, thanks jem [19:50:10] No problem [19:53:00] (03PS1) 10Jbond: compiler_debug: fix output formating [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/816839 [19:57:23] (03PS1) 10Andrew Bogott: wmcs-cinder-backup: fix Retrying() call [puppet] - 10https://gerrit.wikimedia.org/r/816841 [19:57:32] (03PS4) 10Clare Ming: Remove Table of Contents config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810405 (https://phabricator.wikimedia.org/T310527) [20:00:04] RoanKattouw, Urbanecm, and cjming: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T2000). [20:00:05] cjming and ebernhardson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:11] \o [20:00:22] o/ [20:00:28] happy to deploy since i'm on the list [20:00:46] (03CR) 10Clare Ming: [C: 03+2] Remove Table of Contents config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810405 (https://phabricator.wikimedia.org/T310527) (owner: 10Clare Ming) [20:01:45] (03Merged) 10jenkins-bot: Remove Table of Contents config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/810405 (https://phabricator.wikimedia.org/T310527) (owner: 10Clare Ming) [20:02:26] cjming: nothing to test in mine, it's only invoked from a maintenance script (that will run for hours once invoked) [20:03:27] ebernhardson: sounds good - i'll go ahead and sync yours then here as soon as my change finishes syncing [20:04:02] (03PS2) 10Clare Ming: [cirrus] Increase shard count for ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816706 (owner: 10DCausse) [20:05:58] !log cjming@deploy1002 Synchronized wmf-config: Config: [[gerrit:810405|Remove Table of Contents config (T310527)]] (duration: 03m 13s) [20:06:03] T310527: Remove Table of Contents feature flag - https://phabricator.wikimedia.org/T310527 [20:06:07] (03CR) 10Clare Ming: [C: 03+2] [cirrus] Increase shard count for ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816706 (owner: 10DCausse) [20:06:53] (03Merged) 10jenkins-bot: [cirrus] Increase shard count for ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/816706 (owner: 10DCausse) [20:07:43] (03CR) 10Dzahn: "Thank you for this! I tried to compile it and got on deploy1002" [puppet] - 10https://gerrit.wikimedia.org/r/816715 (https://phabricator.wikimedia.org/T303857) (owner: 10Jbond) [20:08:10] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/pcc-worker1002/36374/deploy1002.eqiad.wmnet/change.deploy1002.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/816715 (https://phabricator.wikimedia.org/T303857) (owner: 10Jbond) [20:08:24] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:09:18] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:09:20] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:10:18] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:10:47] !log cjming@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:816706|[cirrus] Increase shard count for ruwikinews]] (duration: 03m 15s) [20:11:40] (03PS4) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [20:14:28] (03PS1) 10Ssingh: hiera: add snake oil blocklist for Wikidough [labs/private] - 10https://gerrit.wikimedia.org/r/816843 [20:15:29] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:16:23] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:16:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:17:23] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:18:08] (03CR) 10Ssingh: [C: 03+1] hiera: add snake oil blocklist for Wikidough [labs/private] - 10https://gerrit.wikimedia.org/r/816843 (owner: 10Ssingh) [20:18:13] (03CR) 10Ssingh: [V: 03+1 C: 03+1] hiera: add snake oil blocklist for Wikidough [labs/private] - 10https://gerrit.wikimedia.org/r/816843 (owner: 10Ssingh) [20:18:22] (03CR) 10Ssingh: [V: 03+1 C: 03+2] hiera: add snake oil blocklist for Wikidough [labs/private] - 10https://gerrit.wikimedia.org/r/816843 (owner: 10Ssingh) [20:18:28] (03CR) 10Ssingh: [V: 03+2 C: 03+2] hiera: add snake oil blocklist for Wikidough [labs/private] - 10https://gerrit.wikimedia.org/r/816843 (owner: 10Ssingh) [20:18:32] (03CR) 10Andrew Bogott: [C: 03+1] "looks good to me! Will merge shortly" [puppet] - 10https://gerrit.wikimedia.org/r/816837 (https://phabricator.wikimedia.org/T312597) (owner: 10FNegri) [20:18:32] I always forget how to make fake private fake happy [20:18:36] (03PS5) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [20:18:40] (03CR) 10FNegri: "Minor comment, will review the Python code tomorrow!" [puppet] - 10https://gerrit.wikimedia.org/r/816841 (owner: 10Andrew Bogott) [20:19:08] PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:22:17] (03CR) 10Andrew Bogott: wmcs-cinder-backup: fix Retrying() call (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/816841 (owner: 10Andrew Bogott) [20:22:38] (03PS2) 10Andrew Bogott: wmcs-cinder-backup: fix Retrying() call [puppet] - 10https://gerrit.wikimedia.org/r/816841 [20:28:20] !log end of UTC late backport window [20:28:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:15] (03PS1) 10Jbond: sshkey: move the sort to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/816847 [20:43:37] (03PS5) 10Dzahn: prometheus::blackbox::http: add/edit parameter comments [puppet] - 10https://gerrit.wikimedia.org/r/807176 [20:54:26] PROBLEM - MegaRAID on an-worker1082 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:57:09] (03PS1) 10Jbond: C:ssh::client: Handle case where aliases not set [puppet] - 10https://gerrit.wikimedia.org/r/816850 [20:58:46] (03CR) 10Dzahn: prometheus::blackbox::http: add/edit parameter comments (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/807176 (owner: 10Dzahn) [20:59:31] (03PS6) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [21:00:05] Reedy, sbassett, Maryum, and manfredi: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T2100). [21:02:21] (03PS1) 10Jbond: never merge, test if the time is puppetdb or reduce [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:03:37] (03PS2) 10Jbond: never merge, test if the time is puppetdb or reduce [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:05:16] (03PS3) 10Jbond: never merge, test if the time is reduce or erb [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:06:42] (03PS2) 10Dzahn: gerrit: add hiera settings for replica to gerrit2002 [puppet] - 10https://gerrit.wikimedia.org/r/815396 (https://phabricator.wikimedia.org/T313250) [21:07:43] jouncebot nowandnext [21:07:43] For the next 1 hour(s) and 52 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220725T2100) [21:07:43] In 3 hour(s) and 52 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220726T0100) [21:08:27] (03PS4) 10Jbond: never merge, test the most basic of reduces [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:09:35] (03CR) 10CI reject: [V: 04-1] never merge, test the most basic of reduces [puppet] - 10https://gerrit.wikimedia.org/r/816852 (owner: 10Jbond) [21:11:04] (03PS5) 10Jbond: never merge, test the most basic of reduces [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:14:30] (03PS6) 10Jbond: never merge, test the most basic of reduces [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:20:50] !log running a no-op sync-world for T313770 to hopefully get 1.39.0-wmf.21 (T308074) to all servers. [21:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:20:58] T313770: Some traffic seems to be reaching 1.39.0-wmf.19 code - https://phabricator.wikimedia.org/T313770 [21:20:58] T308074: 1.39.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T308074 [21:24:21] !log brennen@deploy1002 Started scap: no-op deploy to get wmf.21 on all boxen (T313770) [21:26:27] (03PS3) 10Cwhite: beta-logs: change opensearch version to 2.0 [puppet] - 10https://gerrit.wikimedia.org/r/802863 (https://phabricator.wikimedia.org/T304440) [21:27:13] 10SRE, 10SRE-Access-Requests: Requesting access to maintenance servers for mfossati - https://phabricator.wikimedia.org/T313706 (10kchapman) Hi all, after reviewing I'm going to approve this for restricted. @thcipriani is back tomorrow and can approve for deployment is that is what is needed. [21:27:17] (03PS7) 10Jbond: never merge, test doing the reduce in ruby [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:27:55] !log brennen@deploy1002 Finished scap: no-op deploy to get wmf.21 on all boxen (T313770) (duration: 03m 33s) [21:27:59] T313770: Some traffic seems to be reaching 1.39.0-wmf.19 code - https://phabricator.wikimedia.org/T313770 [21:29:01] (03CR) 10Brennen Bearnes: [C: 03+1] "+1 for the approach." [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) (owner: 10Dduvall) [21:32:57] (03CR) 10Cwhite: [C: 03+2] beta-logs: change opensearch version to 2.0 [puppet] - 10https://gerrit.wikimedia.org/r/802863 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite) [21:34:53] (03PS6) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [21:35:13] (03PS8) 10Jbond: never merge, test doing the reduce in ruby [puppet] - 10https://gerrit.wikimedia.org/r/816852 [21:38:20] (03PS1) 10Cwhite: opensearch: add opensearch_2 systemd unit template [puppet] - 10https://gerrit.wikimedia.org/r/816856 (https://phabricator.wikimedia.org/T304440) [21:38:59] (03CR) 10Cwhite: [C: 03+2] opensearch: add opensearch_2 systemd unit template [puppet] - 10https://gerrit.wikimedia.org/r/816856 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite) [21:43:22] (03PS1) 10Cwhite: opensearch: add opensearch_2.yml and log4j2_2.properties templates [puppet] - 10https://gerrit.wikimedia.org/r/816857 (https://phabricator.wikimedia.org/T304440) [21:43:45] (03CR) 10Cwhite: [C: 03+2] opensearch: add opensearch_2.yml and log4j2_2.properties templates [puppet] - 10https://gerrit.wikimedia.org/r/816857 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite) [21:46:10] (03PS5) 10Andrea Denisse: netmon: Add suppport for multiple backup/passive nodes in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/814848 (https://phabricator.wikimedia.org/T309074) [21:48:27] (03PS7) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [21:51:54] RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [21:55:59] (03PS6) 10Andrea Denisse: netmon: Add suppport for multiple backup/passive nodes in Puppet [puppet] - 10https://gerrit.wikimedia.org/r/814848 (https://phabricator.wikimedia.org/T309074) [21:56:38] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31897 and previous config saved to /var/cache/conftool/dbconfig/20220725-215637-ladsgroup.json [21:56:43] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [21:59:44] (03PS8) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [22:00:10] (03PS9) 10Jbond: never merge, test doing the reduce in ruby [puppet] - 10https://gerrit.wikimedia.org/r/816852 [22:00:35] (03CR) 10CI reject: [V: 04-1] gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) (owner: 10Dduvall) [22:01:13] (03CR) 10CI reject: [V: 04-1] never merge, test doing the reduce in ruby [puppet] - 10https://gerrit.wikimedia.org/r/816852 (owner: 10Jbond) [22:03:37] (03PS9) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [22:06:20] (03PS10) 10Dduvall: gitlab_runner: Handle changes to runner config [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) [22:11:43] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31898 and previous config saved to /var/cache/conftool/dbconfig/20220725-221143-ladsgroup.json [22:12:14] (03CR) 10Dduvall: [C: 03+1] "Sorry for the noise. This has been tested on a couple of untrusted runners and the config merging works correctly. Note that deploying thi" [puppet] - 10https://gerrit.wikimedia.org/r/815769 (https://phabricator.wikimedia.org/T311746) (owner: 10Dduvall) [22:16:53] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36375/console" [puppet] - 10https://gerrit.wikimedia.org/r/816852 (owner: 10Jbond) [22:18:20] (03PS1) 10Cwhite: opensearch: remove discovery.zen.minimum_master_nodes [puppet] - 10https://gerrit.wikimedia.org/r/816861 (https://phabricator.wikimedia.org/T304440) [22:19:43] (03CR) 10Cwhite: [C: 03+2] opensearch: remove discovery.zen.minimum_master_nodes [puppet] - 10https://gerrit.wikimedia.org/r/816861 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite) [22:26:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31899 and previous config saved to /var/cache/conftool/dbconfig/20220725-222648-ladsgroup.json [22:38:20] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2078 - https://phabricator.wikimedia.org/T312754 (10Papaul) [22:38:55] (03PS7) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [22:41:03] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2082 - https://phabricator.wikimedia.org/T313003 (10Papaul) [22:41:53] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312863)', diff saved to https://phabricator.wikimedia.org/P31900 and previous config saved to /var/cache/conftool/dbconfig/20220725-224153-ladsgroup.json [22:41:59] T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863 [22:42:23] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2084 - https://phabricator.wikimedia.org/T313121 (10Papaul) [22:42:39] (03PS8) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [22:45:04] (03PS9) 10Jbond: P:ssh::client: use more modern functions for collecting sskey [puppet] - 10https://gerrit.wikimedia.org/r/816775 [22:50:06] !log pt1979@cumin2002 START - Cookbook sre.dns.netbox [22:54:32] !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [22:54:47] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2084 - https://phabricator.wikimedia.org/T313121 (10Papaul) [22:55:11] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2082 - https://phabricator.wikimedia.org/T313003 (10Papaul) [22:56:09] 10SRE, 10ops-codfw, 10decommission-hardware: decommission db2078 - https://phabricator.wikimedia.org/T312754 (10Papaul) [22:59:16] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: An error occurred checking if Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [23:04:03] (03CR) 10Cwhite: [C: 03+1] "Looks good, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/814848 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse) [23:05:48] RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [23:18:52] PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:22:38] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:25:00] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48391 bytes in 0.120 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:57:01] 10SRE, 10SRE-OnFire, 10Shellbox, 10serviceops, 10Sustainability (Incident Followup): Shellbox resource management - https://phabricator.wikimedia.org/T310557 (10RLazarus) [23:57:23] 10SRE, 10Editing-Team-Request, 10Editing-team, 10MediaWiki-extensions-Score, and 4 others: Reduce Lilypond shellouts from VisualEditor - https://phabricator.wikimedia.org/T312319 (10RLazarus) 05Open→03Resolved a:03RLazarus Yep, looks much better! Closing.