[00:10:15] !log restore of db06 failed yet again. trying mariabackup db06 -> db07 instead of mysqldump (after fixing docs/usage of the former) (T276968) [00:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:10:21] T276968: deployment-db05 disk issues - https://phabricator.wikimedia.org/T276968 [00:28:26] !log mariadb successfully started on db07 following transfer/extraction using mariabackup and following mysql_upgrade (T276968) [00:28:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:28:30] T276968: deployment-db05 disk issues - https://phabricator.wikimedia.org/T276968 [00:54:42] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dduvall) [01:01:02] (03CR) 10Zppix: [C: 03+1] Remove unused CI for 'tools-zppixbot' [integration/config] - 10https://gerrit.wikimedia.org/r/670124 (owner: 10Zppix) [01:07:19] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.36.0-wmf.34 deployment blockers - https://phabricator.wikimedia.org/T274938 (10brennen) End-of-day status: All quiet since deploy to group0. [01:07:24] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dduvall) a:03dduvall The current status of this is: - New Buster based instances `deployment-db07` and `deployment-db08` were launched with `role::mari... [01:10:30] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10greg) Adding in the #DBA tag explicitly so it's seen... >>! In T276968#6899053, @dduvall wrote: > > I'm not totally sure how to proceed from her... [01:30:43] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10SRE, 10Traffic, 10GitLab (Initialization), and 2 others: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn) Ok, thanks. I will prepare patches but... [01:40:08] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10SRE, 10Traffic, 10GitLab (Initialization), and 2 others: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn) >>! In T276144#6890920, @Sergey.Trofimo... [01:41:05] 10Release-Engineering-Team (Logspam), 10MassMessage, 10Wikimedia-production-error: Cannot diff content types other than MassMessageListContent - https://phabricator.wikimedia.org/T265524 (10Krinkle) [01:42:42] 10Release-Engineering-Team (Logspam), 10MediaWiki-Page-diffs, 10Wikidata, 10wdwb-tech-focus, 10Wikimedia-production-error: Assert.php: Bad value for parameter $oldContent: must be a TextContent|null [Story Points 5] - https://phabricator.wikimedia.org/T231084 (10Krinkle) [01:46:36] 10Release-Engineering-Team (Logspam), 10ExtensionDistributor, 10JavaScript, 10Wikimedia-production-error: TypeError: info is undefined (on Special:ExtensionDistributor) - https://phabricator.wikimedia.org/T255619 (10Krinkle) [01:51:18] 10Release-Engineering-Team (Logspam), 10EasyTimeline, 10Wikimedia-production-error: proc line: 2959: warning: points must have either 4 or 2 values per line - https://phabricator.wikimedia.org/T138036 (10Krinkle) [05:21:03] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) Those erros aren't a good thing unfortunately, it looks like InnoDB is very corrupted. Do you still have a logical dump (done via mys... [05:23:35] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) If db06 slave is still up, why not taking a mysqldump from that one? [05:26:46] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Majavah) The mysqldump result is located (at least) on deployment-db07:/srv/backup. db06 is still up, yes. [05:30:44] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) Another option would be to: - Assume db06 would be the new master. Switch mysql on that host off, copy its datadir to another new hos... [08:27:14] 10Gerrit: Rename Gerrit repository "LdapGroups" to "LDAPGroups" - https://phabricator.wikimedia.org/T200736 (10Aklapper) [09:13:48] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10aborrero) FYI: The root cause for the corruption could be a force-reboot force-migration that I had to perform on this host while operating the un... [10:18:12] 10Gerrit, 10Wikimedia-GitHub: Fix extensions-LDAPGroups github mirror - https://phabricator.wikimedia.org/T277010 (10hashar) Github is case insensitive thus `LdapGroups` or `LDAPGroups` are considered the same. As Gerrit replicate both, the last one that got replicated take over and it seems to be the archived... [10:22:01] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10SRE, 10Traffic, 10GitLab (Initialization), and 2 others: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10jbond) > If internal IP and behind caching lay... [10:33:14] 10Gerrit, 10Release-Engineering-Team (Development services), 10Wikimedia-GitHub: Fix extensions-LDAPGroups github mirror - https://phabricator.wikimedia.org/T277010 (10hashar) 05Open→03Resolved a:03hashar I have added a DENY rule to prevent replication of `LdapGroups` to Github (the rule can be seen at... [10:43:04] 10Gerrit: Rename Gerrit repository "LdapGroups" to "LDAPGroups" - https://phabricator.wikimedia.org/T200736 (10hashar) I have fixed the Github replication issue which got filed as T277010. [11:40:03] 10Beta-Cluster-Infrastructure: Beta SWIFT seems to be broken - https://phabricator.wikimedia.org/T276179 (10Tgr) [12:12:06] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Tgr) >>! In T276968#6899299, @Marostegui wrote: > - Assume that maybe db06 might not have the exact same data as db05 if everything wasn't entirel... [12:14:49] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) >>! In T276968#6900193, @Tgr wrote: >>>! In T276968#6899299, @Marostegui wrote: >> - Assume that maybe db06 might not have the exact s... [12:18:49] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Majavah) > would both hosts have the same version How important is this? The old hosts db05 (now-deleted with disk corruption) and db06 have Mari... [12:19:55] (03CR) 10Lars Wirzenius: [C: 03+2] deploy-promote: Don't use a fixed branch for train-dev [tools/release] - 10https://gerrit.wikimedia.org/r/670315 (owner: 10Ahmon Dancy) [12:19:59] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) >>! In T276968#6900223, @Majavah wrote: >> would both hosts have the same version > > How important is this? The old hosts db05 (now-... [12:21:11] (03CR) 10Lars Wirzenius: [C: 03+2] feat: Add train-dev ssh subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/670311 (owner: 10Ahmon Dancy) [12:21:13] (03CR) 10Lars Wirzenius: [V: 03+2 C: 03+2] feat: Add train-dev ssh subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/670311 (owner: 10Ahmon Dancy) [12:25:48] (03CR) 10Lars Wirzenius: "New patch set coming soon." (033 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/670172 (owner: 10Lars Wirzenius) [12:45:27] (03PS2) 10Lars Wirzenius: feat: add script to test Scap under train-dev [tools/scap] - 10https://gerrit.wikimedia.org/r/670172 [13:34:21] marxarelli: Majavah: Hi, what's the current state of beta DB? [13:34:26] can i help with something? [13:45:39] (03PS3) 10Hashar: Remove unused CI for 'tools-zppixbot' [integration/config] - 10https://gerrit.wikimedia.org/r/670124 (https://phabricator.wikimedia.org/T256768) (owner: 10Zppix) [13:46:37] (03CR) 10Hashar: [C: 03+2] "I forgot to delete the job in https://gerrit.wikimedia.org/r/c/integration/config/+/643690" [integration/config] - 10https://gerrit.wikimedia.org/r/670124 (https://phabricator.wikimedia.org/T256768) (owner: 10Zppix) [13:48:01] (03Merged) 10jenkins-bot: Remove unused CI for 'tools-zppixbot' [integration/config] - 10https://gerrit.wikimedia.org/r/670124 (https://phabricator.wikimedia.org/T256768) (owner: 10Zppix) [14:06:36] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) [14:06:51] 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Scap, 10User-brennen: Applying security patches should be robust and also give some useful output - https://phabricator.wikimedia.org/T269153 (10LarsWirzenius) Thank you for your feedback,... [14:06:58] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) p:05Triage→03High [14:08:05] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) [14:08:42] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) [14:26:39] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10Majavah) this is due to database problems: {T276968} [14:52:07] (03CR) 10Gehel: [C: 04-1] Add sonar scanner to discolytics (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/669770 (https://phabricator.wikimedia.org/T264877) (owner: 10ZPapierski) [14:52:52] !log delete deployment-db08 /srv/sqldata to attempt procedure in https://phabricator.wikimedia.org/T276968#6900199 [14:52:57] Urbanecm: fyi ^ [14:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:53:52] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) [14:53:56] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10zeljkofilipin) [14:54:27] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) @Majavah thanks! [14:54:47] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) a:05zeljkofilipin→03None [15:10:37] 10Continuous-Integration-Config, 10Analytics, 10Event-Platform: Jenkins-bot does not submit changes on passing gate-and-submit for /schemas/event/* repos - https://phabricator.wikimedia.org/T277051 (10Mholloway) [15:11:00] 10Gerrit: gerrit's sshd is incompatible with RSA pubkeys + Fedora 33 clients (and future versions of OpenSSH proper) - https://phabricator.wikimedia.org/T276486 (10CDanis) [15:12:23] 10Gerrit: gerrit's sshd is incompatible with RSA pubkeys + Fedora 33 clients (and future versions of OpenSSH proper) - https://phabricator.wikimedia.org/T276486 (10CDanis) [15:12:45] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): rsync “permission denied” errors in several CI builds - https://phabricator.wikimedia.org/T277050 (10Addshore) [15:14:35] 10Continuous-Integration-Config, 10Analytics, 10Event-Platform: Jenkins-bot does not submit changes on passing gate-and-submit for /schemas/event/* repos - https://phabricator.wikimedia.org/T277051 (10hashar) a:03hashar Sounds like Gerrit permissions issues. The `integration` group should be granted the `S... [15:17:30] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): rsync “permission denied” errors in several CI builds - https://phabricator.wikimedia.org/T277050 (10Lucas_Werkmeister_WMDE) This might’ve just been a random hiccup, I haven’t seen it happen again yet. [15:19:50] (03PS1) 10Hashar: Review access change [schemas] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/670360 [15:20:10] (03PS2) 10Hashar: Review access change [schemas] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/670360 (https://phabricator.wikimedia.org/T277051) [15:20:21] (03CR) 10Hashar: [V: 03+2 C: 03+2] Review access change [schemas] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/670360 (https://phabricator.wikimedia.org/T277051) (owner: 10Hashar) [15:22:49] 10Continuous-Integration-Config, 10Analytics, 10Event-Platform, 10Patch-For-Review: Jenkins-bot does not submit changes on passing gate-and-submit for /schemas/event/* repos - https://phabricator.wikimedia.org/T277051 (10hashar) I have created two new repositories for permissions purposes: * `schema` * `sc... [15:22:49] !log rsync deployment-db06:/srv/sqldata to deployment-db08:/srv/sqldata in a tmux session on deploymdeployment-db08 (T276968) [15:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:22:54] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [15:23:46] 10Continuous-Integration-Config, 10Analytics, 10Event-Platform, 10Patch-For-Review: Jenkins-bot does not submit changes on passing gate-and-submit for /schemas/event/* repos - https://phabricator.wikimedia.org/T277051 (10hashar) 05Open→03Resolved Should be good now. Please reopen if that still fails! [15:39:40] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): rsync “permission denied” errors in several CI builds - https://phabricator.wikimedia.org/T277050 (10Lucas_Werkmeister_WMDE) 05Open→03Resolved a:03Lucas_Werkmeister_WMDE Still hasn’t happened again – let’s just close i... [15:54:17] !log Start mariadb on db08 (T276968) [15:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:54:21] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [15:54:30] !log Start `root@deployment-db08:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1` (T276968) [15:54:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:56:20] Urbanecm, Majavah: justing starting my day again. how's it going? [15:56:55] marxarelli: working on restoring db08 atm, haven't touched db07 y et [15:57:05] marxarelli: i copied data to db08 successfully [15:57:37] so per https://phabricator.wikimedia.org/T276968#6900199, now we should "Connect the slave to db06" [15:57:41] how do we do it? [15:57:58] !log set deployment-db06 as readonly from mysql side T276968 [15:58:00] ah, ok. is the plan to switch db06 over to master then? [15:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:58:05] Urbanecm: I'm looking at that atm [15:58:52] marxarelli: yes, per https://phabricator.wikimedia.org/T276968#6900199 [15:59:03] also can someone file a separate ticket for mysql sockets not working for some reason [15:59:32] if the restore is successful (no errors following systemctl start mariadb) then we can record the master log pos on db06 and use `CHANGE MASTER` on db08 to connect to db06 using the repl account [15:59:50] marxarelli: no errors, and mariadb running [15:59:53] i can take over if you like, or assist [15:59:57] excellent! [16:00:10] marxarelli: can you do it then? (unless Majavah is already on it) [16:00:19] https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/MariaDB_Slave_instance_setup#Starting_Replication [16:00:21] yeah, sure thing [16:00:35] I'm working on replication [16:00:38] thanks for doing that! my brain was fried after looking at it all day yesterday [16:00:41] cool :) [16:01:39] Majavah: should i start copying the data to db07 as well, so we have a working copy there as well? [16:01:53] Mar 10 16:01:08 deployment-db08 mysqld[25278]: 2021-03-10 16:01:08 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--log-basename=#' or '--relay-log=deployment-db08-relay-bin' to avoid this problem. [16:01:56] Urbanecm: yes please [16:02:07] doing [16:02:57] that error was solved with a reset slave; [16:04:52] hmmm, why does deployment-puppetmaster04:/var/lib/git/labs/private/modules/secret/secrets/mysql/repl_password have two different passwords [16:04:58] no idea [16:06:29] !log start root@deployment-db07:/srv/sqldata.db06# rsync --progress -r deployment-db06:/srv/sqldata/ . (T276968) [16:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:06:33] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [16:07:05] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dancy) >>! In T276968#6899605, @aborrero wrote: > FYI: The root cause for the corruption could be a force-reboot force-migration that I had to per... [16:08:52] Majavah: how is it going on your side? [16:09:07] Urbanecm: still trying to figure out repl credentials [16:09:10] also, is it fine to remove db05 from db-labs.php in MW config, so MW stops sending it queries? [16:09:21] (it's deleted, but..still) [16:09:22] Urbanecm: there is an open mw patch to change config [16:09:32] s/mw/mw-config [16:09:57] Majavah: we can always change creds if need be to get it working [16:11:09] Majavah: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/670273 doesn't seem to be what we want, right? we're going with db06 as the new master for now [16:11:18] (we'll probably change it, but that doesn't need to happen now) [16:11:21] oops, fair point, that's from time before that [16:11:29] marxarelli: found them, now working on getting it configured [16:11:37] ack [16:12:31] !log deployment-db08 CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='redacted', MASTER_PORT=3306, MASTER_HOST='deployment-db06.deployment-prep.eqiad1.wikimedia.cloud', MASTER_LOG_FILE='deployment-db06-bin.000059', MASTER_LOG_POS=522469730; (T276968) [16:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:12:35] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [16:14:13] db08 replication looks ok [16:14:16] Majavah: settings look right [16:14:18] cool [16:14:26] Urbanecm: I'll update that patch [16:14:31] Majavah: is db06 in R/O now? [16:14:36] https://www.irccloud.com/pastebin/YbsYKUTk/ [16:14:47] Urbanecm: yes, I set that to db-level ro [16:14:51] just needs a `START SLAVE` [16:14:51] great [16:15:10] marxarelli: thought I did that, now donee [16:16:13] transfer to db07 is still running [16:17:16] if we get both db07 and db08 operational and reading to the same point in the binlog i think we could just switch db08 to read from db07 and retire db06 [16:18:02] for the record the "Looks like real one" is the real password [16:18:55] oh boy. good to know :) [16:19:01] beta is quite a mess [16:19:56] marxarelli: yes, very much yes [16:19:59] Urbanecm: patch updated [16:20:10] cool [16:21:22] Majavah: maybe we should merge it w/o db07, and add it later once it's provisioned? or maybe we should just wait for db07 to finish :D [16:21:34] 23GB out of 44 GB now [16:21:40] anything works [16:22:10] for next time, we should delete a lot of articles from simplewiki [16:23:32] yeah i noticed that. simplewiki.revision is the biggest table by far [16:24:06] "The Simple English Wikipedia had its entire database imported, and is the best place to test MediaWiki" according to https://meta.wikimedia.beta.wmflabs.org/wiki/Main_Page [16:25:34] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dduvall) Thanks for the help, everyone. I would still like to get off of db06 if possible at the end of this process since we have to finish the b... [16:26:07] also, backups [16:26:11] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dduvall) Forgot the `UNLOCK TABLES` on db07 :) [16:26:30] dancy: would have been nice [16:29:00] and i'll write up a runbook after we get this working [16:29:45] one of the issues i ran into yesterday was that the docs at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/MariaDB_Slave_instance_setup#On_New_Slave are out of date [16:30:35] apparently the `mariabackup --innobackupex` implementation has changed twice since that was written [16:31:30] Urbanecm: any news on the transfer? :P [16:31:34] and now you have to use `mbstream` to unpack the streamed dump but still `--stream=xbstream` on the source end. it doesn't support `--stream=tar` at all [16:32:43] 38G GB out of 44 GB Majavah [16:32:50] so close! [16:33:30] marxarelli: i just rsync it tbh :D [16:33:54] rsync is tried and true :) [16:34:07] wdym? [16:34:29] 41 GB out of 44 GB [16:34:31] 10Continuous-Integration-Infrastructure, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) [16:34:34] sorry, an idiom [16:34:48] i mean it's a reliable tool [16:35:58] Urbanecm, Majavah: what do you think of https://phabricator.wikimedia.org/T276968#6901184 ? [16:36:34] marxarelli: lgtm to me, but maybe stop by in #wikimedia-databases if they have time to quickly look [16:36:39] k [16:38:39] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) @dduvall if possible I would also set `read_only=ON` on the current master (I guess db06) to be fully sure no writes are happening. If... [16:39:42] 10Continuous-Integration-Infrastructure, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) Also appears to affect quibble-vendor... [16:41:37] 10Continuous-Integration-Config, 10Excimer, 10LuaSandbox: Improve CI for PECL packages - https://phabricator.wikimedia.org/T277063 (10Legoktm) [16:42:00] 10Continuous-Integration-Config, 10Excimer, 10LuaSandbox: Improve CI for PECL packages - https://phabricator.wikimedia.org/T277063 (10Legoktm) [16:42:02] 10Continuous-Integration-Config: CI should validate that the pecl tarball contains all the necessary files to build the extension - https://phabricator.wikimedia.org/T276417 (10Legoktm) [16:42:04] 10Continuous-Integration-Config, 10LuaSandbox: CI should run `pear package-validate` for PHP extensions with package.xml files - https://phabricator.wikimedia.org/T207686 (10Legoktm) [16:43:10] Urbanecm: are you cloning db07 from db08 or from db06? [16:43:28] db06 [16:44:21] * Majavah patiently waits for that to finish [16:44:35] transfer completed [16:44:37] starting mariadb [16:45:06] I have the change master command ready for copy pasting if you don't have it handy [16:45:43] 10Continuous-Integration-Config, 10Analytics, 10Event-Platform: Jenkins-bot does not submit changes on passing gate-and-submit for /schemas/event/* repos - https://phabricator.wikimedia.org/T277051 (10Mholloway) Looking good! Thank you, @hashar! [16:45:53] !log root@deployment-db07:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1 # T276968 [16:45:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:45:57] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [16:46:31] 10Beta-Cluster-Infrastructure, 10DBA, 10Patch-For-Review: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10dduvall) From @Marostegui in IRC: "To be honest, I would do it in different steps, set db06, make sure all is fine and the slave replicates just f... [16:46:31] lmk when that's done, and I'll do the replication bits after that [16:46:58] Majavah: ready for replication bit [16:47:01] *bits [16:47:04] sure, doing that next [16:49:37] !log add deployment-db07 as a replica of db06 for T276968 [16:49:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:49:51] Ping me once we can merge the patch [16:50:02] !log `reset slave;` on new master deployment-db06 T276968 [16:50:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:53:18] Urbanecm, marxarelli: db[07-08] replication looks good, db06 replica was reset and still db-level read only, I think we can add the config patch to pool the new servers, then set mw and mysql read only and carefully test that it works [16:53:44] sounds good to me [16:53:49] MW is still read only [16:53:53] Merging the patch [16:54:00] let's pool the new servers first [16:55:12] Majavah: aren't they pooled already? [16:55:31] Or you mean in MW config? [16:55:33] Urbanecm: pool as in make mediawiki be aware of them, which the patch you just merged does [16:55:39] Ah, got it [16:57:14] how do we want to test replication before promoting db07 as master? [16:57:50] Majavah: let's leave beta with db06 as master for...the rest of the day, and do it later? [16:58:12] sure [17:01:17] Majavah: I'd say it works [17:01:18] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) Looking at the... [17:01:32] mwscript mysql.php --wiki=cswiki works as intended [17:01:51] with both hostnames [17:01:58] nice [17:02:06] so revert the mw-config readonly then? [17:02:12] Majavah: agreed, I'll do it [17:02:24] Majavah: can you set master to rw? [17:02:30] 10Continuous-Integration-Config: CI should ensure all files are listed in package.xml - https://phabricator.wikimedia.org/T277068 (10Legoktm) [17:02:44] before or after mediawiki? [17:02:48] Majavah: before [17:02:55] ie. now [17:03:14] !log make deployment-db06 read-write T276968 [17:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:03:20] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [17:03:25] thanks [17:03:37] what about slaves, will replication work with ro set? [17:04:31] 10Beta-Cluster-Infrastructure: Broken puppet on deployment-mediawiki servers - https://phabricator.wikimedia.org/T277069 (10Majavah) [17:04:33] Majavah: ^^ do you happen to know^^? [17:04:38] Urbanecm: no idea, ask the dbas [17:04:55] will do [17:06:03] 18:05 no, slaves can and should remain as read-only [17:06:11] merging MW patch [17:09:46] !log set beta cluster mediawiki as read write on mw config (T276968) [17:09:49] mw is now rw [17:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:09:49] T276968: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 [17:09:58] yup, indeed [17:09:59] thanks Majavah [17:10:06] checking if replication looks good [17:10:07] let me crash test it [17:10:30] tried to delete my sandbox on enwiki beta [17:10:57] replication looks good at first glance [17:11:21] db 08 has the change [17:11:49] and db07 has it as well [17:11:53] i think we made it [17:11:57] Majavah: marxarelli: ^^ [17:12:02] * Majavah is happy [17:12:13] \o/ [17:12:40] closed the task [17:12:46] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE): Upgrade deployment-prep-db hosts to buster/MariaDB 10.4 - https://phabricator.wikimedia.org/T268628 (10Urbanecm) [17:12:51] Victory! [17:12:57] welcome back, beta [17:13:06] thanks a ton, Urbanecm and Majavah [17:13:06] I'll reenable the beta-update-databases-eqiad job. [17:13:08] 10Beta-Cluster-Infrastructure, 10DBA: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Urbanecm) 05Open→03Resolved a:05dduvall→03None This was done. Thanks everyone, especially @majavah who de-facto leaded this change! [17:13:13] thanks dancy [17:13:22] Mavajah: Sounds like you're the new beta DBA. Congrats. [17:13:45] lol [17:13:51] 10Beta-Cluster-Infrastructure, 10DBA: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Marostegui) Happy it worked fine! [17:14:01] Urbanecm: what about the master switchover? make a separate task or what? [17:14:14] yeah, a follow-up task would be good [17:14:18] Majavah: yeah, make a separate task for it. Let's do it after the dust settles a bit [17:14:43] sure, doing [17:15:52] (03CR) 10DannyS712: "This change is ready for review." [integration/config] - 10https://gerrit.wikimedia.org/r/670527 (owner: 10DannyS712) [17:17:56] 10Beta-Cluster-Infrastructure: Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10Majavah) [17:18:01] Urbanecm, marxarelli: https://phabricator.wikimedia.org/T277070 [17:18:37] Majavah: right on. i will put it on our team's TODO [17:19:01] Thanks. [17:19:09] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) The QUnit failu... [17:19:11] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10Majavah) It's read write now, however we're planning on doing a... [17:19:17] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10dduvall) [17:19:24] marxarelli: sure, happy to help with that too [17:19:38] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10Majavah) [17:19:43] Majavah: happy to accept your help :) [17:19:43] Majavah: i think we can do it together tomorrow? [17:19:53] * Majavah checks calendar [17:20:18] (03CR) 10Hashar: [C: 03+2] Whitelist Sahilgrewalhere [integration/config] - 10https://gerrit.wikimedia.org/r/670527 (owner: 10DannyS712) [17:21:21] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) > The QUnit fai... [17:21:25] Urbanecm: sure, works for me [17:21:52] Majavah: are you available during tomorrow (european) morning? [17:22:13] (03Merged) 10jenkins-bot: Whitelist Sahilgrewalhere [integration/config] - 10https://gerrit.wikimedia.org/r/670527 (owner: 10DannyS712) [17:22:27] Urbanecm: unfortunately not before about 11 UTC [17:23:25] Majavah: you're in CET, just like me, right? [17:23:36] Urbanecm: no, I'm in EET [17:23:49] Ack [17:24:07] (03CR) 10Hashar: [C: 03+2] "I had the jobs updated and the fix is confirmed. I just forgot to get the change merged." [integration/config] - 10https://gerrit.wikimedia.org/r/670317 (https://phabricator.wikimedia.org/T276983) (owner: 10Hashar) [17:24:13] !log `rm -rf /srv/backup /srv/restore` on deployment-db07 and reenabling puppet [17:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:24:29] Majavah: 1:30 UTC works for you? [17:24:54] Urbanecm: yes, sure [17:25:02] beta-update-databases-eqiad just ran successfully. [17:25:17] wahoo [17:25:22] dancy: nice [17:25:37] Excellent [17:25:42] (03Merged) 10jenkins-bot: Add /cache to wikimedia-fundraising-civicrm-docker [integration/config] - 10https://gerrit.wikimedia.org/r/670317 (https://phabricator.wikimedia.org/T276983) (owner: 10Hashar) [17:25:53] !log `rm -rf /srv/restore` on deployment-db08 and reenabling puppet [17:25:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:26:41] !log `rm -rf /srv/dump` on deployment-db06 and reenabling puppet [17:26:43] (03CR) 10Ahmon Dancy: [C: 03+2] feat: add script to test Scap under train-dev [tools/scap] - 10https://gerrit.wikimedia.org/r/670172 (owner: 10Lars Wirzenius) [17:26:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:26:43] 10Beta-Cluster-Infrastructure: Broken puppet on deployment-mediawiki servers - https://phabricator.wikimedia.org/T277069 (10Majavah) [17:29:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), 10Zuul, and 3 others: Improve scheduling of CI jobs invoked by zuul - https://phabricator.wikimedia.org/T258630 (10hashar) Meanwhile I have reviewed... [17:30:33] thanks everyone Majavah Urbanecm marxarelli and dancy [17:35:39] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) I haven’t been... [17:36:43] Majavah: i scheduled the switchover on https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1902868&oldid=1902858, as we'll need the MW config repo. [17:37:07] Urbanecm: thanks, can you add a comment to the task as well? [17:37:13] on it [17:37:49] done [17:38:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10User-Urbanecm: Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10Urbanecm) This is **scheduled** for Mar 11, 13:30 UTC. Beta will be read only during the switchover. [17:38:27] 10Continuous-Integration-Config, 10Release-Engineering-Team (Code Health), 10Code-Health: Create a CI container for mediawiki/tools/dependency-analysis - https://phabricator.wikimedia.org/T227603 (10hashar) We have lost the dynamic from Summer 2019, just went on this task cause I stepped on an old Phabricato... [17:38:31] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10User-Urbanecm: Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10Urbanecm) [17:39:18] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10User-Majavah, 10User-Urbanecm: Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10Majavah) [17:44:18] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Many WikibaseLexeme tests suddenly failing in quibble-vendor-mysql-php72-{selenium,noselenium}-docker - https://phabricator.wikimedia.org/T277061 (10Lucas_Werkmeister_WMDE) >>! In T277061#... [17:45:44] 10Release-Engineering-Team (CI & Testing services), 10Cloud-VPS, 10cloud-services-team (Kanban): Support Cinder for CI docker workers - https://phabricator.wikimedia.org/T277078 (10Andrew) [17:48:40] Urbanecm: banner set :D [17:49:04] I don't think we need it on beta, but thanks. [17:49:14] Majavah: also this event is probably tech news worthy. [17:49:21] Just as a fyi [17:49:41] the breakage or the maintenance tomorrow? [17:50:00] Majavah: breakage [17:51:30] legoktm, marxarelli, hashar, Sometime soon I'd like to connect with someone about CI VMs and moving from LVM storage to Cinder. Can you suggest who would be best? (Hashar seems the most likely but timezones aren't ideal). [17:51:41] confusingly this is a separate question form T276327 which Hashar is already on top of [17:51:41] T276327: cloud: puppetmasters: adopt cinder volumes to store certs and git repos - https://phabricator.wikimedia.org/T276327 [17:51:55] not me :) [17:52:16] maybe thcipriani has suggestions ^^ [17:52:37] here's an example of what I'm talking about: https://gerrit.wikimedia.org/r/c/operations/puppet/+/670524 [17:57:38] andrewbogott: Is the idea to rebuild one the agent nodes with this config and see how it goes? [17:57:57] yeah, and confirm that my puppet changes don't break puppet on the existing nodes (which is probably the hard part) [17:58:33] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Majavah) [17:58:35] 10Beta-Cluster-Infrastructure, 10User-Majavah: Replace deployment-memc[04-05] with Buster hosts - https://phabricator.wikimedia.org/T276707 (10Majavah) 05Open→03Resolved The old VMs were deleted. [18:02:14] 10Beta-Cluster-Infrastructure, 10User-Majavah: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 (10Majavah) [18:02:30] 10Beta-Cluster-Infrastructure, 10User-Majavah: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 (10Majavah) [18:02:37] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Majavah) [18:03:26] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Majavah) [18:05:25] !log create deployment-ircd02 for T277081 [18:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:05:29] T277081: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 [18:08:17] 10Release-Engineering-Team (Logspam), 10JavaScript, 10MediaWiki-Interface (Tables), 10Patch-For-Review, and 2 others: Some tables cannot be sorted (TypeError: $nextRows[i] is undefined in table sorting / Uncaught TypeError: Cannot read property 'type' of undefine... - https://phabricator.wikimedia.org/T265503 [18:08:52] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10User-Majavah, 10User-Urbanecm, 10User-zeljkofilipin: Beta cluster master switchover to deployment-db07 - https://phabricator.wikimedia.org/T277070 (10zeljkofilipin) [18:18:46] 10Release-Engineering-Team (Logspam), 10JavaScript, 10MediaWiki-Interface (Tables), 10Patch-For-Review, and 2 others: Some tables cannot be sorted (TypeError: $nextRows[i] is undefined in table sorting / Uncaught TypeError: Cannot read property 'type' of undefine... - https://phabricator.wikimedia.org/T265503 [18:26:52] Urbanecm: added to https://meta.wikimedia.org/wiki/Tech/News/2021/11 [18:26:57] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: selenium-daily-beta(commons)-MediaWiki fails with `readonly: The wiki is currently in read-only mode.` - https://phabricator.wikimedia.org/T277044 (10zeljkofilipin) 05Open→03Resolved a:03zeljkofilipin [18:27:01] 10Beta-Cluster-Infrastructure, 10DBA: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10zeljkofilipin) [18:27:16] 10Beta-Cluster-Infrastructure, 10DBA, 10User-notice: deployment-db05 needs replacing following disk corruption - https://phabricator.wikimedia.org/T276968 (10Majavah) [18:27:27] hello - crossposting from #wikimedia-sre. Where do .config files come from when doing a deploy-local? I am trying to figure out why a deploy is trying to pull from the old deploy server. There is almost no mention whatsoever of deploy1001 on the server but when I start the deploy the .config file is pulled from somewhere with the bad host hardcoded [18:34:34] partial answer - the DEPLOY_HEAD file in /srv/deployment/REPONAME/deploy on the deploy server [18:41:14] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Andrew) Thank you for all your work on this, @Majavah ! [18:46:16] !log switch floating ip 185.15.56.34 to deployment-ircd02 T277081 [18:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:46:20] T277081: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 [19:01:49] 10Beta-Cluster-Infrastructure, 10User-Majavah: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 (10Majavah) floating IP updated, but VMs still point to the old name: ` taavi@deployment-mwlog01:/srv/mw-log$ host irc.beta.wmflabs.org irc.beta.wmflabs.org has address 172.16... [19:03:25] hnowlan: https://phabricator.wikimedia.org/T145373 seems to cover the history behind writing the `DEPLOY_HEAD` to `.config` [19:04:31] * marxarelli looks further to see if one can force a fetch [19:05:07] ah, looks like `--refresh-config` might help [19:05:12] https://phabricator.wikimedia.org/rMSCAe6769d6c6d051fdf77d26ecab113a13202c59863 [19:05:20] hnowlan: ^ [19:08:20] although i don't see how/when `--refresh-config` gets passed to `deploy-local`. thcipriani do you happen to know? [19:35:31] (03PS1) 10Ahmon Dancy: Demote a _runcmd log message to debug level [tools/scap] - 10https://gerrit.wikimedia.org/r/670564 [19:36:52] !log shutdown deployment-ircd T277081 [19:36:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:36:57] T277081: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 [19:37:34] marxarelli: IIRC refresh config is its own stage of deployment for scap3 (I might not remember correctly thought) [19:37:44] 10Beta-Cluster-Infrastructure, 10User-Majavah: Replace deployment-ircd with a Buster host - https://phabricator.wikimedia.org/T277081 (10Majavah) deployment-ircd02 is now working on Buster and from a very quick look it seems to be working properly [19:39:32] 10Beta-Cluster-Infrastructure: Beta SWIFT seems to be broken - https://phabricator.wikimedia.org/T276179 (10Tgr) >>! In T276179#6885108, @Majavah wrote: > I changed it to point to `deployment-ms-fe03`. I'm not sure if it helped, but FileOperation.log has now HTTP 404s instead of 503s. It's all 503s again. `$wm... [19:41:28] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Majavah) [19:59:23] thcipriani: Krinkle https://phabricator.wikimedia.org/T277094 [19:59:29] and patch is https://gerrit.wikimedia.org/r/670569 [19:59:46] should be merged into new branch before rolling the train forward [20:00:28] gave a +2 [20:04:12] Jdlrobson any other paches blocking the train I can review before I leave? [20:09:56] DannyS712: not that i know of :) [20:10:16] thcipriani: can you take care of the rest? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/670529 [20:10:25] okay then bye [20:11:03] * thcipriani looks [20:13:25] thanks DannyS712 :) enjoy rest of your day! [20:13:34] Jdlrobson: brennen is on it in -operations, FYI [20:17:35] too many channels, but yeah, slinging that out, just waiting on zuul [20:17:42] sweet [20:17:49] any testing we should do on that? [20:30:25] (03CR) 10Dduvall: PipelineRunner: allowedCredentials (034 comments) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) (owner: 10Jeena Huneidi) [20:35:42] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.36.0-wmf.34 deployment blockers - https://phabricator.wikimedia.org/T274938 (10brennen) [20:51:54] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), 10Zuul, and 3 others: Improve scheduling of CI jobs invoked by zuul - https://phabricator.wikimedia.org/T258630 (10hashar) Upstream patch got merged... [21:12:27] (03PS5) 10Jeena Huneidi: PipelineRunner: allowedCredentials [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) [21:13:17] (03PS6) 10Jeena Huneidi: PipelineRunner: allowedCredentials [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) [21:13:48] (03PS7) 10Jeena Huneidi: PipelineRunner: allowedCredentials [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) [21:14:29] (03CR) 10Jeena Huneidi: PipelineRunner: allowedCredentials (034 comments) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) (owner: 10Jeena Huneidi) [21:15:56] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.36.0-wmf.34 deployment blockers - https://phabricator.wikimedia.org/T274938 (10brennen) First patch for T277094 didn't work out. @Jdlrobson continues to investigate; hol... [21:51:21] 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Scap, 10User-brennen: Applying security patches should be robust and also give some useful output - https://phabricator.wikimedia.org/T269153 (10sbassett) >>! In T269153#6896577, @LarsWirze... [21:52:31] (03PS8) 10Jeena Huneidi: PipelineRunner: allowedCredentials [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) [21:54:18] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Missing docker iptables nat rules for releases hosts - https://phabricator.wikimedia.org/T276869 (10Dzahn) ` [releases1002:~] $ sudo iptables -L | grep DOCKER DOCKER-ISOLATION all -- anywhere... [21:55:15] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Missing docker iptables nat rules for releases hosts - https://phabricator.wikimedia.org/T276869 (10Dzahn) >>! In T276869#6894650, @Legoktm wrote: > Including `profile::docker::builder` would be... [21:55:59] (03PS9) 10Jeena Huneidi: PipelineRunner: allowedCredentials [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/668245 (https://phabricator.wikimedia.org/T269902) [21:57:48] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Missing docker iptables nat rules for releases hosts - https://phabricator.wikimedia.org/T276869 (10Dzahn) @dduvall Good to resolve? [21:57:56] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Missing docker iptables nat rules for releases hosts - https://phabricator.wikimedia.org/T276869 (10Dzahn) a:03Dzahn [22:01:19] 10Release-Engineering-Team-TODO, 10DNS, 10SRE, 10Traffic, and 4 others: DNS for GitLab - https://phabricator.wikimedia.org/T276170 (10Dzahn) a:03Dzahn [22:02:32] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10GitLab (Initialization), 10User-brennen: Remove Speed & Function blockers for GitLab work - https://phabricator.wikimedia.org/T274458 (10Dzahn) [22:03:08] 10Release-Engineering-Team-TODO, 10DNS, 10SRE, 10Traffic, and 4 others: DNS for GitLab - https://phabricator.wikimedia.org/T276170 (10Dzahn) 05Open→03Resolved done! ` [authdns1001:~] $ host gitlab.wikimedia.org gitlab.wikimedia.org is an alias for gitlab1001.wikimedia.org. gitlab1001.wikimedia.org has... [22:03:36] 10Release-Engineering-Team-TODO, 10DNS, 10SRE, 10Traffic, and 4 others: DNS for GitLab - https://phabricator.wikimedia.org/T276170 (10Dzahn) @Sergey.Trofimovsky.SF See above, the gitlab.wikimedia.org name now points to the VM. Keep in mind it's both IPv4 and IPv6. [22:06:36] James_F: would you like me to amend for replace-in-place? [22:12:33] (03PS1) 10Zoranzoki21: Add MassMessage as dependency of the MassMessageEmail extension [integration/config] - 10https://gerrit.wikimedia.org/r/670601 [22:30:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), 10Zuul, and 3 others: Improve scheduling of CI jobs invoked by zuul - https://phabricator.wikimedia.org/T258630 (10dancy) Yay! [22:30:25] 10MediaWiki-Releasing, 10MediaWiki-Internationalization, 10translatewiki.net, 10MW-1.35-release: Restart i18n backports in MW releases - https://phabricator.wikimedia.org/T277104 (10Reedy) [22:32:53] legoktm: Oh, sure, please. [22:36:49] 10MediaWiki-Releasing, 10MediaWiki-Internationalization, 10translatewiki.net, 10MW-1.35-release: Restart i18n backports in MW releases - https://phabricator.wikimedia.org/T277104 (10Legoktm) [22:37:19] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10MW-on-K8s, 10serviceops: Missing docker iptables nat rules for releases hosts - https://phabricator.wikimedia.org/T276869 (10dduvall) 05Open→03Resolved Yes! Thanks so much for the fix. I've verified that traffic is now being properly rou... [22:37:32] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10SRE, 10Traffic, 10GitLab (Initialization), and 2 others: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn) >>! In T276144#6899820, @jbond wrote: >... [22:52:24] 10Phabricator: Spaces request for Technical-Program-Management - https://phabricator.wikimedia.org/T277107 (10MBinder_WMF) [22:54:26] James_F: double check my changes look good? I adjusted the test too [22:59:23] well Roan got it [23:06:06] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops: Containers on releases hosts cannot update apt cache from non-WMF sources - https://phabricator.wikimedia.org/T277109 (10dduvall) [23:06:12] Yeah I was just about to make the same change [23:06:22] So I decided to approve yours instead [23:09:42] do you the alerts here when Icinga talks about releases servers or is it just -operations [23:10:33] mutante: I think you missed a verb out [23:10:41] or at least a word :P [23:31:57] 10Release-Engineering-Team (Code Health), 10MediaWiki-extensions-FlaggedRevs, 10Code-Stewardship-Reviews: Code stewardship review: FlaggedRevs - https://phabricator.wikimedia.org/T185664 (10Ladsgroup) Trying to debug {T233561} and {T275322}, no success after an hour. This extension is basically re-implementi... [23:35:40] 10Phabricator (Upstream), 10Upstream: After adding an action in Chrome on phone, it cannot be removed (i.e. there is no cross on the right) - https://phabricator.wikimedia.org/T272788 (10IN) [23:39:09] 10Phabricator (Upstream), 10Upstream: After adding an action in Chrome on phone, it cannot be removed (i.e. there is no cross on the right) - https://phabricator.wikimedia.org/T272788 (10IN) {F34150386} It seems that this problem only occurs when the screen resolution is too small. I used my other device with... [23:47:18] 10Release-Engineering-Team (Code Health), 10MediaWiki-extensions-FlaggedRevs, 10Code-Stewardship-Reviews: Code stewardship review: FlaggedRevs - https://phabricator.wikimedia.org/T185664 (10Jrbranaa) Sorry for the delay in response on this task. The Code Stewardship Review process has definitely run into so...