[03:28:07] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<33.33%) [04:17:38] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201907), 10Release, 10Train Deployments: 1.34.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T220739 (10tstarling) [04:51:55] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201907), 10Release, 10Train Deployments: 1.34.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T220739 (10tstarling) [07:03:03] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:39:12] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Scap, 10serviceops, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10Joe) @thcipriani do you need more information from us? Any idea when work on scap will be... [10:32:44] 10Release-Engineering-Team, 10Wikimedia-Portals, 10Jenkins: wikimedia-portals-build job failing with "node: command not found" - https://phabricator.wikimedia.org/T228639 (10Jdrewniak) [10:49:12] !log ladsgroup@deployment-deploy01:/srv/mediawiki-staging/php-master/extensions$ sudo rm -rf Wikidata [10:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:49:35] kill build, it should use the wikibase and other extensions directly [12:35:48] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 10Patch-For-Review, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050 (10Ladsgroup) This somehow happened in the last weekend for wiki... [12:37:08] * apergos raises an eyebrow at that rm [12:40:17] shopping for anyone who can +2 this... https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/524760/ (see task: https://phabricator.wikimedia.org/T228614) [12:46:22] apergos: Are you looking for actual review or rubber stamping? I understand it well enough to see that this might be a problem and we should merge yr patch. [12:47:06] it doesn't need review from scratch (that already happened in master) but strictly rubber stamping is a bit um [12:47:25] a bit um. if you know what I mean [12:47:28] I may have exaggerated ;-) [12:48:01] More importantly, should this be merged during a SWAT window? [12:48:35] it can either go during the 'morning swat' (9 pm for me) or I can see if james et al are willing to push it out sooner when they show up [12:48:42] (not sure we have other regular deployers around) [12:49:12] I'd prefer sooner rather than later for obvious reasons. but my drop dead must is tonight before I go to bed [12:49:18] Okay--I have deployment rights but that would require asking for a window... [12:50:13] so I'd say, don't worry about it, thanks for the offer, let's see what the sf crowd says when they show up [12:51:01] ok, good luck at timezones [12:57:33] ty! [14:49:51] 10Release-Engineering-Team, 10Wikimedia-Portals, 10Jenkins: wikimedia-portals-build job failing with "node: command not found" - https://phabricator.wikimedia.org/T228639 (10debt) p:05Triage→03High Hi @hashar - would you be able to help with this error? It's preventing the wikipedia.org portal from being... [14:54:42] 10Phabricator, 10Developer-Advocacy, 10Developer-Wishlist (2017), 10Goal: Consolidate the many tech events calendars in Phabricator's calendar - https://phabricator.wikimedia.org/T1035 (10Qgil) Yes, agreed. [15:03:02] 10Project-Admins: Requesting new Phabricator tag "Observability-Goal" - https://phabricator.wikimedia.org/T228673 (10herron) [15:12:36] James_F: I saw that! (and thanks much)... should I grab a window during the 'morning swat' or should I poke you/$someone in -operations and get it sent around sooner? I would prefer sooner but it is y'all's call [15:12:49] apergos: Just poke me in these cases, it's fine. :-) [15:12:58] cool! [15:13:19] I would have merged it half an hour ago, but the train was being re-rolled. [15:13:33] ah ha [15:22:36] 10Release-Engineering-Team (Pipeline), 10Release Pipeline, 10serviceops: Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10Jdforrester-WMF) [15:23:13] 10Release-Engineering-Team (Pipeline), 10Operations, 10Release Pipeline, 10serviceops, 10Goal: Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10akosiaris) p:05Triage→03Normal [15:23:30] apergos: Deployed. [15:24:20] oh, all the way around! thank you [15:24:30] I was expecting to do the mwdebug drill etc [15:24:36] We're a full service happiness team. ;-) [15:24:44] :-) :-) [15:24:46] Yeah, it was quick enough to confirm. [15:24:59] And I can't kick off a dump to test myself. ;-) [15:25:04] have checked my code on mwdebug against the bad page anyways :-) [15:25:11] (it's good now) [15:25:14] Excellent. [15:25:18] should try a special:export now... [15:25:55] (Next, I'm deploying a back-port to fix tomorrow's i18n build after the partial switch to extension.json in Wikibase.) [15:27:36] (looks good) [15:27:41] oh my [15:27:47] good luck! [15:28:08] Psh, it's fine. [15:35:27] 10Release-Engineering-Team: make-wmf-branch should fail as soon as the first command fails - https://phabricator.wikimedia.org/T228658 (10MarcoAurelio) [15:45:59] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T220736 (10CDanis) [15:46:01] 10Release-Engineering-Team (Kanban), 10MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10CDanis) [16:01:43] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201907), 10Release, 10Train Deployments: 1.34.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T220739 (10LarsWirzenius) 05Open→03Resolved All wikis are now wmf.14. Nothing seems to have exploded so far.... [16:26:47] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Other / Uncategorized), 10Mail, 10MediaWiki-Email, 10Operations: [betacluster] Cannot confirm email address - confirmation never received - https://phabricator.wikimedia.org/T227714 (10JTannerWMF) p:05Triage→03High [16:38:09] I have mw+2 but I cannot "unWIP" a change. I can't remember if that was intended? [16:43:50] Only project owners && admins can unmark wips. There is a new wip right in gerrit (though i think that's in gerrit 3.0+) [16:44:05] mafk ^ [16:44:11] paladox [16:44:14] oops [16:44:15] thanks [16:44:43] Right. mediawiki.git owners are Administrators [16:44:55] so no owners as that group is empty - lol [16:45:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Other / Uncategorized), 10Release-Engineering-Team-TODO (201907), 10User-greg: Request access to deployment-prep - https://phabricator.wikimedia.org/T228021 (10greg) 05Open→03Resolved Alright, you should be good to go. [16:46:24] 10Project-Admins: Requesting new Phabricator tag "Observability-Goal" - https://phabricator.wikimedia.org/T228673 (10Aklapper) Any description understandable for average humans, please? :) Which type of project tag? See https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects Is this somehow re... [16:46:25] Admins are the owner of the group. [16:46:35] *project [16:46:35] Yup [16:46:55] yeah, can't see members since you're not a group owner [16:47:01] But only `gerrit2` is in the group [16:47:26] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Other / Uncategorized), 10Mail, 10MediaWiki-Email, 10Operations: [betacluster] Cannot confirm email address - confirmation never received - https://phabricator.wikimedia.org/T227714 (10herron) @greg sure, I'm back today from being out of the offi... [16:47:26] there are a bunch of ldap groups in there as well [16:47:41] I'm talking about https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/434115/ fwiw [16:47:51] thcipriani yup I know [16:47:58] where "a bunch" == one now :) [16:48:11] no point in +2ing if it's WIP [16:48:15] CI won't merge it [16:48:17] mafk: need me to start review on that one? [16:48:33] i.e., mark not-WIP? [16:48:37] thcipriani maybe only Start Review. I don't want to +2 [16:48:51] People is asking for an additional review by Amir [16:49:09] mafk: done [16:49:13] chachi [16:49:16] thanks :) [16:49:19] yw :) [16:49:44] Project beta-scap-eqiad build #258960: 04FAILURE in 4.6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258960/ [16:50:59] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 4 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [16:54:25] Project beta-scap-eqiad build #258961: 04STILL FAILING in 3.7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258961/ [16:56:13] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201907), 10Release, 10Train Deployments: 1.34.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T220740 (10Jdforrester-WMF) [16:57:19] Train unblocked. For now. [17:03:17] Ominous. [17:03:45] * James_F grins. [17:09:37] 10Gerrit, 10Gerrit-Privilege-Requests, 10Wikimedia-IEG-grant-review: Give access to l10n-bot to wikimedia/iegreview repository - https://phabricator.wikimedia.org/T228490 (10MarcoAurelio) Apparently this is not working :( @hashar @thcipriani Any idea? Cfr. https://gerrit.wikimedia.org/r/#/c/wikimedia/iegrev... [17:10:33] (03CR) 10MarcoAurelio: "> Thank you :]" [wikimedia/iegreview] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/524493 (https://phabricator.wikimedia.org/T228490) (owner: 10MarcoAurelio) [17:12:43] Yippee, build fixed! [17:12:44] Project beta-scap-eqiad build #258962: 09FIXED in 8 min 20 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258962/ [17:13:51] thcipriani: et. al. we're suffering a hardware failure so I'm needing to move a lot of deployment-prep again. I'm doing it all in a lump, right now, so it shouldn't be offline for all that long. [17:13:54] sorry about the extra downtime [17:14:01] andrewbogott: ack [17:14:27] I also need to move some CI workers but I'll depool/repool them so it shouldn't cause much fuss [17:14:43] andrewbogott: sorry for your troubles, good luck, and thanks for the heads up [17:15:19] This is a server that we have declared 'fixed' about 4 times now — each time it works great in testing and the buckles as soon as we put real VMs on it :( [17:18:26] A lot of this latest round of juggling is in service of speeding up evacuations and elimination spofs so hopefully this kind of thing will be less serious in the future [17:19:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Other / Uncategorized), 10Mail, 10MediaWiki-Email, 10Operations: [betacluster] Cannot confirm email address - confirmation never received - https://phabricator.wikimedia.org/T227714 (10herron) I'm not having luck reproducing this with my own non-... [17:20:06] Project beta-update-databases-eqiad build #35432: 04FAILURE in 5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/35432/ [17:26:28] Project beta-scap-eqiad build #258964: 04FAILURE in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258964/ [17:27:40] PROBLEM - SSH on deployment-jobrunner03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:49] silly question - when are we going to do branch cut for wmf.15 ? [17:31:04] raynor: Tuesday before the group0 deploy [17:31:32] PROBLEM - Host deployment-jobrunner03 is DOWN: CRITICAL - Host Unreachable (172.16.4.98) [17:31:35] greg-g, thx [17:31:45] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201907), 10Scap, 10serviceops, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10thcipriani) >>! In T224857#5352559, @Joe wrote: > @thcipriani do you need more i... [17:36:38] Project beta-scap-eqiad build #258965: 04STILL FAILING in 2 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258965/ [17:37:28] PROBLEM - Host integration-slave-docker-1041 is DOWN: CRITICAL - Host Unreachable (172.16.1.36) [17:39:19] RECOVERY - Host deployment-jobrunner03 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [17:44:44] PROBLEM - Host deployment-logstash2 is DOWN: CRITICAL - Host Unreachable (172.16.5.22) [17:46:36] Yippee, build fixed! [17:46:36] Project beta-scap-eqiad build #258966: 09FIXED in 2 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/258966/ [17:46:59] RECOVERY - Host integration-slave-docker-1041 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms [17:48:02] PROBLEM - Host integration-slave-docker-1040 is DOWN: CRITICAL - Host Unreachable (172.16.3.86) [17:50:08] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:58:08] RECOVERY - Host integration-slave-docker-1040 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [17:59:46] RECOVERY - Host deployment-logstash2 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [18:06:32] PROBLEM - Host deployment-ores01 is DOWN: CRITICAL - Host Unreachable (172.16.4.95) [18:12:11] RECOVERY - Host deployment-ores01 is UP: PING OK - Packet loss = 0%, RTA = 4.51 ms [18:16:41] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10serviceops-radar: Gerrit http threads stuck behind sendemail thread - https://phabricator.wikimedia.org/T224448 (10thcipriani) Happened again today: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvM... [18:19:22] PROBLEM - Host deployment-urldownloader02 is DOWN: CRITICAL - Host Unreachable (172.16.4.11) [18:21:24] Yippee, build fixed! [18:21:24] Project beta-update-databases-eqiad build #35433: 09FIXED in 1 min 23 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/35433/ [18:22:34] I think I'm done breaking deployment-prep for now [18:24:21] RECOVERY - Host deployment-urldownloader02 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [18:28:10] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:29:12] raynor, I'm doing the branch cutting tomorrow and will probably start doing it around 10:00 UTC