[02:33:26] <wikibugs>	 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Krinkle) >>! In T280605#7060441, @gerritbot wrote: > %%%[mediawiki/extensions/DiscussionTools@master] Allow talk pages to...
[02:42:24] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle)
[03:18:17] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team, 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle)
[03:38:33] <wikibugs>	 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Ladsgroup) I agree, it's just better safe than sorry. Maybe it errors out and DBAs need some time (=complications). Definitely not...
[03:40:27] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team, 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Krinkle) @Marostegui @Kormat @jcrespo   I need from you a simple explanation or best-guess about what the story...
[06:37:06] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team, 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10jcrespo) you heard you saying for the second time: > it does not use transactions  But **every query is a trans...
[07:20:05] <wikibugs>	 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Joe) >>! In T120242#6945584, @Ottomata wrote: > Another idea that may not be feasible:  Would it...
[08:23:52] <volans>	 kormat, jynus: FYI I've updated cumin on cumin2002 (there wasn't anything running there), to be 100% sure it's all good before upgrading cumin1001 too is there a quick way to check that your side of the automation and backups works fine? The library API have no backward incompatibilities, if there is any it's a bug ;)
[08:24:51] <jynus>	 just run transfer.py
[08:27:39] <volans>	 ack, any suggested params to use for a safe test run? I've never run it :)
[08:28:10] <volans>	 and looking at wikitech and doc.w.o I don't see an example of a safe/noop command 
[08:29:16] <jynus>	 there is really no noop command, other than --help, just transfer some random file on your home somewhere
[08:29:41] <jynus>	 that should use the remoteExecution api taht uses cumin
[08:30:30] <volans>	 ack
[08:33:25] <volans>	 looks to work fine
[08:37:48] <jynus>	 cool
[08:38:47] <jynus>	 what we can do next is migrate 1 backup job to the new host, so we can check the full process
[08:39:29] <volans>	 I tested on cumin2001 (buster, new cumin), so that I can upgrade cumin on cumin1001 too
[08:40:05] <volans>	 the bullseye host AFAIK has still apt issues with the mariadb client
[08:40:09] <volans>	 (cumin2002)
[08:40:23] <jynus>	 ah, true
[08:40:41] <jynus>	 I guess we still can run a manual run of a backup
[08:40:57] <jynus>	 let me do that on a screen on cumin2001
[08:41:28] <volans>	 ack, thx
[08:41:57] <jynus>	 I think there are stuff that uses remoteexecution on wmfmariadbpy, but you will have to ask kormat 
[08:42:15] <volans>	 I'm 99.9% sure all works fine
[08:42:50] <volans>	 but you know, there's always that one case...
[08:43:23] <jynus>	 I trust your code, I don't trust mine :-)
[08:44:34] <volans>	 I don't trust any code, mine in particular ;)
[08:45:08] <jynus>	 the backup will also work to test a 0.4->0.5 upgrade on the wmfbackups, which should be a noop
[08:45:21] <volans>	 ack
[08:46:44] <jynus>	 it is currently running
[08:46:54] <jynus>	 I have a question for you about cumin
[08:46:59] <jynus>	 if you have 1 minute
[08:47:33] <volans>	 sure
[08:48:24] <jynus>	 it does not accept as remote execution content an array, only a string, right?
[08:48:40] <jynus>	 as in e.g. ['ls', '-l']
[08:49:21] <volans>	 no, just a string, as it will be passed to ssh by clustershell
[08:49:48] <volans>	 but a ' '.join(['ls', '-l']) should be enough to convert it
[08:50:00] <jynus>	 yes, that is what I do
[08:50:49] <volans>	 btw, with the new cumin you should be able to get rid of the /dev/null trick
[08:51:27] <volans>	 it is now possible to tell cumin to print or not independently both the command output and the progress bars
[08:51:38] <jynus>	 but, and this is a bug on my side, would you have any suggestion to go about: https://phabricator.wikimedia.org/T256749
[08:52:47] <volans>	 I should check the code, but you could quote the paths
[08:52:51] <volans>	 with single quotes
[08:53:16] <volans>	 so tht the final ssh command will have '/home/jynus$(rm /home/jynus/vuln)' as path
[08:53:30] <volans>	 including the quote
[08:56:50] <jynus>	 I will try, but I had issues always being able to get to shell (e.g. you can add single quetes there)
[08:58:31] <jynus>	 I will create a ticket to track the revert of the output code
[09:00:02] <volans>	 sorry meeting, bbl
[09:10:37] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10SRE-tools: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10jcrespo)
[09:20:03] <jynus>	 arturo, labsdb1009/10/11 are failing to collect metrics. Should those be removed from monitoring (tendril/zarcillo)?
[09:26:54] <arturo>	 mmmm jynus I'm not sure. Are they wiki replicas? Better to open a phab task to help clarify the situation
[09:27:22] <jynus>	 do you know if there is one for labsdb decommission or update?
[09:27:56] <jynus>	 I see, there is T282522
[09:27:57] <stashbot>	 T282522: decommission labsdb1009.eqiad.wmnet - https://phabricator.wikimedia.org/T282522
[09:28:33] <jynus>	 I will let him handle next week
[09:28:39] <jynus>	 *it
[09:54:17] <arturo>	 👍
[10:10:19] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10SRE-tools: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10LSobanski) p:05Triage→03Low
[10:51:17] <jynus>	 volans: [09:43:39]: INFO - Backup finished correctly / Last snapshot for x1 at codfw (db2101.codfw.wmnet:3320) taken on 2021-05-13 09:13:05 (280 GB)
[11:00:24] <volans>	 great!
[11:03:00] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10SRE-tools: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10Volans) Version is 4.1.0, currently latest on apt.w.o and PyPI and deployed to all cumin hosts in production. See `worker....
[11:03:06] <volans>	 thanks for testing!
[11:40:40] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104
[11:47:58] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104
[12:57:09] <wikibugs>	 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > it would make the database availability depend on the availability of eventgate, and...
[13:22:24] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team, 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10LSobanski) p:05Triage→03High
[13:27:58] <wikibugs>	 10DBA, 10DiscussionTools, 10Performance-Team, 10Editing-team (FY2020-21 Kanban Board), and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Esanders)
[21:21:48] <wikibugs>	 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team, 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10dpifke) Have we considered using table partitioning to make the purge less expensive?  If we partitioned based...