[00:23:40] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Make auditing members of mailing lists bound to a user right easier - https://phabricator.wikimedia.org/T286122 (10Platonides) mailman3 supports having an account with multiple emails. Requiring one of them (not necessarily the mail used in the mailing list) to match the wiki...
[00:25:23] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Znuny, 10Chinese-Sites: Mailman cannot correctly decode GB2312-superset mails labelled as GB2312 (non-standard behavior) - https://phabricator.wikimedia.org/T173894 (10Platonides)
[00:35:27] <wikibugs>	 (03CR) 10Platonides: "The proper error code would be 551 («please use this other email instead»). Not that I am aware of any implementation using it, but it wou" [puppet] - 10https://gerrit.wikimedia.org/r/681242 (https://phabricator.wikimedia.org/T280472) (owner: 10Legoktm)
[00:38:32] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Platonides) This can't be //that// hard. @Legoktm do you want me to have a look at this? Doesn't seem to require any advenced permission, only on potd and ml, so...
[02:04:04] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1019 is CRITICAL: CRIT Memory 98% used. Largest process: mysqld (9461) = 76.0% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[03:14:48] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1019 is CRITICAL: CRIT Memory 98% used. Largest process: mysqld (9461) = 76.0% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[03:36:54] <icinga-wm>	 PROBLEM - SSH on cp5011.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:20:40] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1019 is CRITICAL: CRIT Memory 98% used. Largest process: mysqld (9461) = 76.0% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[04:37:42] <icinga-wm>	 RECOVERY - SSH on cp5011.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:06:52] <sDrewth>	 metawiki stopped recording successful edits for abusefilters at 2021-07-01 19:35 per https://meta.wikimedia.org/wiki/Special:AbuseLog?wpSearchUser=&wpSearchPeriodStart=&wpSearchPeriodEnd=&wpSearchTitle=&wpSearchImpact=1&wpSearchAction=any&wpSearchActionTaken=&wpSearchFilter=&wpSearchWiki=
[05:07:35] <sDrewth>	 that seems more aligned with failed logging or recording; how can I check what was rolled at to metawiki at that time?
[05:10:04] <p858snake>	 https://phabricator.wikimedia.org/T286140
[05:10:11] <sDrewth>	 19:35	<brennen@deploy1002>	Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
[05:10:12] <stashbot>	 T285951: Some section links in search results are redlinks - https://phabricator.wikimedia.org/T285951
[05:10:56] <sDrewth>	 Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711|Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
[05:11:53] <p858snake>	 everything is wmf.12 atm https://versions.toolforge.org/
[05:36:52] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), Fresh: 103 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:58:53] <DannyS712>	 The fix is to revert a recent patch, so it can be safely self +2'ed - I've +2'ed the revert for master, once it merges will create a cherry pick for wmf.12. Is anyone available for an emergency deployment?
[06:17:38] <wikibugs>	 (03PS1) 10DannyS712: Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" [extensions/AbuseFilter] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702957 (https://phabricator.wikimedia.org/T286140)
[06:17:49] <DannyS712>	 ^ thats the emergency deployment needed
[06:49:14] <icinga-wm>	 PROBLEM - SSH on mw1284.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:00:04] <jouncebot>	 Deploy window No deploys all week! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210704T0700)
[07:00:18] <icinga-wm>	 PROBLEM - SSH on cp5006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:49:04] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) The problem seems to be fixed, and just to be sure:  ` elukey@cr3-eqsin> show chassis environment pem  PEM 0 status:   State...
[07:49:53] <wikibugs>	 (03PS1) 10Elukey: Revert "Depool eqsin" [dns] - 10https://gerrit.wikimedia.org/r/702959
[07:54:58] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) Only nit - cp5006's mgmt is still not reachable, we should follow up.
[07:58:05] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Revert "Depool eqsin" [dns] - 10https://gerrit.wikimedia.org/r/702959 (owner: 10Elukey)
[08:02:17] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Revert "Depool eqsin" [dns] - 10https://gerrit.wikimedia.org/r/702959 (owner: 10Elukey)
[08:02:40] <elukey>	 !log repool eqsin after equinix maintenance - T286113
[08:02:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:50] <stashbot>	 T286113: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113
[08:16:06] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is CRITICAL: 36.94 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[08:37:39] <elukey>	 expected, eqsin repooled --^
[08:41:42] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[09:07:53] <wikibugs>	 (03PS1) 10Majavah: kubeadm: Upgrade Calico to v3.18.4 [puppet] - 10https://gerrit.wikimedia.org/r/703061 (https://phabricator.wikimedia.org/T280342)
[09:11:34] <wikibugs>	 (03PS2) 10Majavah: kubeadm: Upgrade Calico to v3.18.4 [puppet] - 10https://gerrit.wikimedia.org/r/703061 (https://phabricator.wikimedia.org/T280342)
[09:51:50] <icinga-wm>	 RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:41:59] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 104 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[12:53:53] <wikibugs>	 10SRE, 10ops-eqsin, 10Traffic, 10User-MediaJS: IPMI Sensor Status Power_Supply Status: Critical on various eqsin servers - https://phabricator.wikimedia.org/T286113 (10elukey) 05Open→03Resolved a:03elukey
[14:06:56] <icinga-wm>	 PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:16:58] <icinga-wm>	 PROBLEM - SSH on mw1279.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:49:14] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_report_accounting_run.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:07:06] <icinga-wm>	 RECOVERY - SSH on cp5006.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:14:12] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:17:48] <icinga-wm>	 RECOVERY - SSH on mw1279.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:57:14] <icinga-wm>	 PROBLEM - SSH on mw1284.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:08:34] <icinga-wm>	 RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:58:02] <icinga-wm>	 RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:04:17] <brennen>	 DannyS712: just seeing above.  i can deploy that revert.
[17:05:45] <brennen>	 (assuming someone is around to test.)
[17:16:44] <brennen>	 ^ going ahead with above.  i think from T286140 it should be clear if fix worked.
[17:16:45] <stashbot>	 T286140: AbuseLog no longer recording revids of saved edits - https://phabricator.wikimedia.org/T286140
[17:20:10] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" [extensions/AbuseFilter] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702957 (https://phabricator.wikimedia.org/T286140) (owner: 10DannyS712)
[17:38:33] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" [extensions/AbuseFilter] (wmf/1.37.0-wmf.12) - 10https://gerrit.wikimedia.org/r/702957 (https://phabricator.wikimedia.org/T286140) (owner: 10DannyS712)
[17:43:53] <logmsgbot>	 !log brennen@deploy1002 Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957|Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
[17:44:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:03] <stashbot>	 T286140: AbuseLog no longer recording revids of saved edits - https://phabricator.wikimedia.org/T286140
[17:45:45] <brennen>	 sDrewth, DannyS712: above deployed, confirmed that diff links appear on new entries on https://en.wikipedia.org/wiki/Special:AbuseLog
[17:46:41] <sDrewth>	 woop woop, thx brennen 
[17:55:39] <urbanecm>	 thanks brennen :)
[18:01:16] <brennen>	 you bet.
[18:03:54] <Reedy>	 That's the second regression caused by that line of work
[21:17:05] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Mailman doesn't replace email in notice when changing subscription email - https://phabricator.wikimedia.org/T286149 (10Legoktm) This was fixed upstream in https://gitlab.com/mailman/postorius/-/commit/b7fcca522ac0dd86831eb9788a8ec13abcdd2dd4
[21:21:06] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Mailman 3: Changing email address seems to break subscription for listadmins list - https://phabricator.wikimedia.org/T282328 (10Legoktm) Some related looking upstream issues are: https://gitlab.com/mailman/postorius/-/issues/472 and https://gitlab.com/mailman/postorius/-/issu...
[21:26:40] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Legoktm) a:05Legoktm→03Platonides Sorry, not sure why I dropped the ball on this.   >>! In T265568#7196242, @Platonides wrote: > This can't be //that// hard....
[22:00:15] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Platonides) Well, having too many things is probably part of the reason ;-)  I'll have a look. We will see if I end up regretting being so optimistic :P
[22:16:08] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10Platonides) And, weird enough, it both [[ https://lists.wikimedia.org/hyperkitty/list/daily-image-l@lists.wikimedia.org/thread/5TOP2JJ5WZJ2PC6PKFZTITF7BFZ2H62A/ |...
[23:53:35] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Mailman 3: Changing email address seems to break subscription for listadmins list - https://phabricator.wikimedia.org/T282328 (10TerraCodes) >>! In T282328#7196785, @Legoktm wrote: > Some related looking upstream issues are: https://gitlab.com/mailman/postorius/-/issues/472 an...