[00:04:18] !log Restarting release jenkins for upgrade. [00:04:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:06:51] (That's me.) [00:07:10] thanks for upgrades:) [00:42:38] PROBLEM - Host deployment-dumps-puppetmaster02 is DOWN: CRITICAL - Host Unreachable (172.16.4.101) [02:22:56] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:05] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:15] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:28] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [02:41:55] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 92380 bytes in 1.060 second response time [02:42:06] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 51680 bytes in 1.161 second response time [02:42:48] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 91946 bytes in 0.966 second response time [04:13:37] (03PS1) 10KartikMistry: Add apertium-anaphora and apertium-recursive to CI [integration/config] - 10https://gerrit.wikimedia.org/r/578706 (https://phabricator.wikimedia.org/T234181) [04:31:49] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 5 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [06:44:52] looks like CI is stuck? [07:20:16] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10nshahquinn-wmf) >>! In T706#5957478, @Aklapper wrote: > @Zoranzoki21, @nshahquinn-wmf: Hi, I have added you both. (Usual default disclaimer: Please follow [guidelines](https://www.me... [07:28:31] 10Project-Admins, 10wmfdata-python: Create a wmfdata-python project - https://phabricator.wikimedia.org/T247060 (10nshahquinn-wmf) [08:29:12] 10Project-Admins, 10Security Preview, 10Security Readiness Reviews, 10Security-Team, and 2 others: combine security readiness review and security preview boards with third tag - https://phabricator.wikimedia.org/T247326 (10Aklapper) [08:38:11] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Operations, 10SRE-tools, and 5 others: Integrate automated DNS snippets into CI - https://phabricator.wikimedia.org/T243362 (10hashar) [08:50:08] (03CR) 10Hashar: "The netbox git export would have to be refreshed in the entry point. It is probably as easy as just pulling from it :)" (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/568546 (https://phabricator.wikimedia.org/T243362) (owner: 10CRusnov) [08:53:45] (03CR) 10Hashar: "Sorry Cas, I have missed your previous question:" [integration/config] - 10https://gerrit.wikimedia.org/r/568546 (https://phabricator.wikimedia.org/T243362) (owner: 10CRusnov) [08:54:05] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Operations, 10SRE-tools, and 5 others: Integrate automated DNS snippets into CI - https://phabricator.wikimedia.org/T243362 (10hashar) p:05Triage→03Medium [09:09:11] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) Correct! :) That's also linked from the Phabricator front page. [09:59:17] (03PS1) 10Hashar: Compress MediaWiki Junit XML files [integration/config] - 10https://gerrit.wikimedia.org/r/578880 [10:17:03] 10Gerrit, 10Social-Tools: Gerrit group creation request: Create group for Social-Tools - https://phabricator.wikimedia.org/T154078 (10DannyS712) @ashley if this is still desired (per above comment), should it still be stalled? [11:18:18] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [11:22:05] 10Gerrit, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Wikimedia-production-error (Shared Build Failure): Jenkins job failing intermittently due to Gerrit HTTP 502 errors when cloning repos - https://phabricator.wikimedia.org/T246763 (10Lucas_Werkmeister_WMDE) I think I’ll just update this co... [11:32:38] (03PS1) 10Ema: atskafka: build with PBUILDER_USENETWORK [integration/config] - 10https://gerrit.wikimedia.org/r/578924 (https://phabricator.wikimedia.org/T237993) [12:01:05] 10Release-Engineering-Team (Pipeline), 10Analytics, 10Analytics-Kanban, 10Release Pipeline, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10akosiaris) @ottomata, eventstreams has been switched to kubernetes and TLS. The following 2 graphs show... [12:17:58] is Zuul having issues again? no gate-and-submit started for https://gerrit.wikimedia.org/r/578336 [12:20:34] and 'recheck' has no effect on changes that were stuck before https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaWikiChat/+/578884 [12:20:51] looking at the dashboard, I suspect it hasn’t started any new builds in a while [12:21:00] and only gate-and-submit still has a few that haven’t finished yet [12:32:00] 10Release-Engineering-Team (Pipeline), 10Analytics, 10Analytics-Kanban, 10Release Pipeline, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10akosiaris) I 've bumped CPU limits for eventstreams itself by 25%. The cause was that inexplicable CPU thr... [12:32:24] (for the record/logs, Zuul discussion seems to be happening more in -operations than -releng at the moment) [12:48:02] 10Release-Engineering-Team-TODO, 10Scap, 10MediaWiki-Internationalization, 10Performance-Team, 10Patch-For-Review: Use static php array files for l10n cache at WMF (instead of CDB) - https://phabricator.wikimedia.org/T99740 (10Ladsgroup) >>! In T99740#5934663, @Krinkle wrote: >> @thcipriani mentioned thi... [12:59:35] (03PS1) 10Thcipriani: Test Zuul [blubber] - 10https://gerrit.wikimedia.org/r/578947 [13:02:42] (03CR) 10Thcipriani: "recheck" [blubber] - 10https://gerrit.wikimedia.org/r/578947 (owner: 10Thcipriani) [13:12:08] 10Continuous-Integration-Infrastructure, 10Zuul, 10Patch-For-Review, 10Upstream: zuul status page has double underline in Firefox due to abbr styles - https://phabricator.wikimedia.org/T109747 (10Lucas_Werkmeister_WMDE) 05Open→03Resolved This seems to have been resolved at some point, underlines are si... [13:21:53] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Pchelolo) [13:32:45] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [13:33:19] hi releng folks, is zuul still really backlogged? https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/578656 no gate-and-submit after +2 [13:37:39] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [13:43:57] (03Abandoned) 10Thcipriani: Test Zuul [blubber] - 10https://gerrit.wikimedia.org/r/578947 (owner: 10Thcipriani) [13:46:44] 10Release-Engineering-Team, 10MediaWiki-extensions-FlaggedRevs, 10Regression: FlaggedRevs: Automatic user promotion stopped working on some wikis on June 24, 2019 - https://phabricator.wikimedia.org/T237191 (10Wickie37) >>! In T237191#5910130, @Zache wrote: > AND (frp_revertedEdits>frp_totalContentEdits) <... [13:51:15] 10Continuous-Integration-Infrastructure, 10Operations, 10Traffic: debian-glue-backports not enabling backports on buster - https://phabricator.wikimedia.org/T247316 (10ema) 05Open→03Resolved a:03ema >>! In T247316#5956625, @gerritbot wrote: > Change 578543 **merged** by Ema: > [operations/puppet@produc... [14:19:00] 10MediaWiki-Codesniffer, 10Patch-For-Review: Avoid assignment in return statements - https://phabricator.wikimedia.org/T170332 (10Nikerabbit) I expected to see some sort of justification in this task why this kind of code is bad. [14:37:18] 10Release-Engineering-Team-TODO, 10Scap, 10MediaWiki-Internationalization, 10Performance-Team, 10Patch-For-Review: Use static php array files for l10n cache at WMF (instead of CDB) - https://phabricator.wikimedia.org/T99740 (10AlexisJazz) >>! In T99740#5959996, @Ladsgroup wrote: >>>! In T99740#5934663, @... [14:49:21] thcipriani so when gerrit throws "Cannot open ReviewDb", it causes zuul to stall? [14:56:11] 10Project-Admins: Create a dedicated project tag for #Small-Wiki-Tool-Kits - https://phabricator.wikimedia.org/T247418 (10Aklapper) p:05Triage→03Medium [15:00:19] paladox: zuul makes a bad assumption about calls to gerrit somewhere in a thread. When that call hangs or fails (for whatever reason, including the "cannot open reviewdb" error) it causes zuul to hang waiting on that thread to release a lock (I *think* -- hasha has investigated more deeply than I). [15:01:05] Ah ok [15:15:40] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10ssastry) This box has Parsoid/JS enabled. The puppet package... [15:18:42] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [15:56:51] (03CR) 10Jforrester: [C: 03+2] atskafka: build with PBUILDER_USENETWORK [integration/config] - 10https://gerrit.wikimedia.org/r/578924 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema) [15:58:26] (03Merged) 10jenkins-bot: atskafka: build with PBUILDER_USENETWORK [integration/config] - 10https://gerrit.wikimedia.org/r/578924 (https://phabricator.wikimedia.org/T237993) (owner: 10Ema) [16:01:16] thcipriani https://gerrit-review.googlesource.com/c/gerrit-monitoring/+/258543 [16:06:40] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10Jdforrester-WMF) >>! In T246854#5960462, @ssastry wrote: > T... [16:07:24] oh noooo [16:07:30] I spy with my little eye another giant Vector chain in Zuul… [16:08:57] that's a ton of vector changes :P [16:09:09] Yeah, I might just de-list the user from CI. [16:09:51] How did they even push that many? [16:09:56] has anyone contacted them already? [16:09:58] Cause git review goes lolno by default [16:10:05] And requires some effort to do it otherwise [16:10:06] Lucas_WMDE: Many, many times. [16:10:13] Reedy: Indeed. And yet, they persisted. [16:10:13] ok >< [16:10:22] James_F: I'll support de-whitelisting them [16:10:35] And no re-adding until they listen/respond [16:10:35] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10ssastry) I see. I suppose when the production puppet package... [16:12:50] Reedy i wonder too [16:13:04] paladox: It's not difficult, you can do it with git push, and do it in batches... [16:13:12] I've done it before for stacks of security issues [16:13:15] oh [16:13:57] (03PS1) 10Reedy: Remove AronManning from CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/578983 [16:14:15] (03CR) 10Paladox: [C: 03+1] Remove AronManning from CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/578983 (owner: 10Reedy) [16:14:22] You can do something like... [16:14:31] `git push origin HEAD~20:refs/for/master` then `git push origin HEAD~10:refs/for/master` [16:14:32] etc [16:14:39] James_F: ^ Shall we jfdi? [16:15:03] heh [16:15:05] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10Jdforrester-WMF) >>! In T246854#5960652, @ssastry wrote: > I... [16:15:29] Reedy: OK. [16:15:33] lol, jerkins has v-1'd somewhere in his stack [16:15:35] So the rest are gonna fail too [16:15:43] (03CR) 10Reedy: [C: 03+2] Remove AronManning from CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/578983 (owner: 10Reedy) [16:15:45] But taking ~12 hours of CI time. [16:16:09] Restarting zuul and rechecking jobs is less painful than waiting for it all [16:16:27] … yeah. :-( [16:17:17] (03Merged) 10jenkins-bot: Remove AronManning from CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/578983 (owner: 10Reedy) [16:17:51] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/578983 [16:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:22:15] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Security-Team: Refine Seakeeper proposal for Security/SRE review - https://phabricator.wikimedia.org/T243436 (10dduvall) [16:26:38] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:26:58] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Security-Team: Refine Seakeeper proposal for Security/SRE review - https://phabricator.wikimedia.org/T243436 (10dduvall) See the task description for a draft (and very WIP) rewrite of the Seakeepe... [16:32:07] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Research available third-party Argo and Kubernetes cloud providers - https://phabricator.wikimedia.org/T244384 (10dduvall) [16:36:36] 10Deployments, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Write script to apply security patches - https://phabricator.wikimedia.org/T247075 (10LarsWirzenius) Apparently there is a scap plugin for this already. I'll be looking at that and addi... [16:37:38] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap, 10Python3-Porting: Port scap to Python 3 - https://phabricator.wikimedia.org/T246025 (10LarsWirzenius) I'm working on a test environment in which to run the test suite. It turns out that the scap Debian package is built in jessie, but the sca... [16:38:32] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap: Add an integration smoke test to scap - https://phabricator.wikimedia.org/T245614 (10LarsWirzenius) 05Open→03Resolved This change has been merged and the resulting CI post-merge failure has been addressed. Closing task as finished. [16:48:30] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap: Make scap release with --canary-wait-time and integration test changes - https://phabricator.wikimedia.org/T246455 (10LarsWirzenius) p:05Medium→03High [16:49:09] (03PS2) 10Jforrester: Compress MediaWiki Junit XML files [integration/config] - 10https://gerrit.wikimedia.org/r/578880 (owner: 10Hashar) [16:50:28] James_F: Reedy can we add rate limiting to gerrit? Also does https://gerrit.wikimedia.org/r/c/integration/config/+/578983 mean Aron cannot push to gerrit now? [16:50:41] He can push to his hearts content [16:50:46] CI won't do anything for him though [16:50:53] Other people can recheck his patches fine [16:51:21] Jdlrobson: By default, git review won't let you push more than 10 patches in one go anyway [16:51:37] so how did he get round that?! [16:51:52] Using git push directly, you can do what you want [16:51:56] o_O [16:51:56] Just have to do it in batches [16:52:06] Jdlrobson: Indeed. [16:52:09] have we got a spam policy? [16:52:10] We've had the same problem with security releases before when we've had more than 10 patches per branch [16:52:20] i feel this user would benefit from a temporary ban from spamming gerrit [16:52:28] *for [16:52:29] Reedy: Git review now does 20, BTW, up from 10. [16:52:35] orly? [16:52:44] Yeah, new in 1.27 I think. [16:52:48] Jdlrobson: Well, that's what removing the CI whitelist does [16:52:56] He's not clogging up the shared resources at least [16:53:04] but he can still post patches to my review queue right? [16:53:16] If your personal queue doesn't exclude him, yes. [16:53:25] I think we have the limit set at 10? [16:53:32] For anyone outside of the trusted group [16:53:50] Oh, that might be it. [16:54:05] I know gerrit does reject it too, yeah [16:54:06] I thought it was new in a new version of git review, but it could be that. [16:55:37] yeh [16:57:08] heh, that user has uploaded over 50 changes for vector in the last few days [16:58:09] 10Release-Engineering-Team (Other / Uncategorized), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Releng March priorities vs resources Chart - https://phabricator.wikimedia.org/T242237 (10thcipriani) {F31676710} Updated post team meeting with actual blockers. [16:58:12] He’s done a lot [16:58:18] I just don't know how to deal with this user. I don't want to discourage them, but they are creating a strain on my team's reviewing abilities. [16:58:39] And there's no way i can keep on top of 50+ patches [16:58:43] Jdlrobson: Do you understand what he's actually doing? [16:58:52] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10cscott) >>! In T246854#5960666, @Jdforrester-WMF wrote: >>>!... [16:58:54] the content of the patches? [16:58:54] You’d be best being polite and sending him a message [16:58:56] As we don't really understand https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/578890 [16:59:11] "The individual patches are necessary for demonstration: the whole chain is a POC and example for the iterative implementation of the current DI state. " [16:59:11] Not really at a glance [16:59:19] He’s quite nice despite his history on enwp [16:59:21] I've said numerous times please write tasks before pushing patches [16:59:43] but the patches keep coming. [17:00:11] and i'm trying to give other advice - such as focus on 2-3 things at a time because the user gets frustrated that nobody is reviewing their stuff [17:00:21] 10Gerrit, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Wikimedia-production-error (Shared Build Failure): Jenkins job failing intermittently due to Gerrit HTTP 502 errors when cloning repos - https://phabricator.wikimedia.org/T246763 (10thcipriani) a:03thcipriani [17:00:22] but at the same time when you post 50+ patches it's hard to know what actually needs review [17:00:31] Ah, I’ll have a word later as well [17:01:11] and then there is the rebasing.. im getting about 20+ email notifications every morning. I've had to start unsubscribing from certain patches. I really want to help this user, but the conversation gets fragmented over gerrit and phab [17:01:18] Jdlrobson i can only imagine how big your dashboard is :P [17:01:40] I have patches not touched since 2013 on mine [17:01:41] if they were on IRC it would be useful [17:01:43] * Reedy shrugs [17:01:56] i'm sure Aron didn't understand/mean to clog CI [17:02:02] They have been before [17:02:03] but needs to get the feedback somewhere [17:02:06] AronM [17:02:12] Indeed [17:02:18] That's why his message is kinda confusing [17:02:20] Aron likely is now confused why CI is not running [17:02:28] It sounds like he only needs 2 or 3 patches to be pushed [17:02:30] I’ll ask him to come on [17:02:41] But for some reason, also thinks he needs to split them up like that [17:03:04] i think they are used to a gitlab type workflow [17:07:41] Jdlrobson: per Reedy earlier, if/when you do review his patched, if you comment "recheck" from your account, CI will run, so that's a way to control the rate. Then if it fails you can let them review that first before doing more CR. [17:08:15] also thanks for clarifying this isn't (yet) done in close collab with your team. I was confused as well. [17:08:30] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap, 10User-brennen: Investigate duration of scap sync for 1.35.0-wmf.23 train deploy - https://phabricator.wikimedia.org/T247426 (10brennen) [17:08:52] of course, then rebases of the lot ontop are just going to upset CI if done in large batches.. [17:14:39] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Consider what paging who should get for what in RelEng - https://phabricator.wikimedia.org/T247427 (10Jdforrester-WMF) [17:19:36] (03PS4) 1020after4: Scap3 deploy repo for zuul [integration/zuul/deploy] - 10https://gerrit.wikimedia.org/r/577846 (https://phabricator.wikimedia.org/T215458) [17:20:39] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Consider what paging who should get for what in RelEng/EngProd? - https://phabricator.wikimedia.org/T247427 (10Jdforrester-WMF) [17:21:10] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10observability: Consider what paging who should get for what in RelEng/EngProd? - https://phabricator.wikimedia.org/T247427 (10Jdforrester-WMF) [17:21:53] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Consider what paging who should get for what in RelEng/EngProd? - https://phabricator.wikimedia.org/T247427 (10Jdforrester-WMF) [17:24:12] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-brennen: Learn Phabricator update / deployment workflow - https://phabricator.wikimedia.org/T244628 (10brennen) 05Open→03Resolved This is an ongoing process, but we've got... [17:30:35] !log Soft-restarting Jenkins to drop unused plugin. [17:30:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:05:15] Project mwcore-phpunit-coverage-master build #536: 15ABORTED in 3 hr 5 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/536/ [18:07:08] eek [18:17:06] (03CR) 10Hashar: [C: 03+2] Add apertium-anaphora and apertium-recursive to CI [integration/config] - 10https://gerrit.wikimedia.org/r/578706 (https://phabricator.wikimedia.org/T234181) (owner: 10KartikMistry) [18:18:09] (03Merged) 10jenkins-bot: Add apertium-anaphora and apertium-recursive to CI [integration/config] - 10https://gerrit.wikimedia.org/r/578706 (https://phabricator.wikimedia.org/T234181) (owner: 10KartikMistry) [18:20:52] hashar: ? [18:23:37] (03PS3) 10Hashar: Compress MediaWiki Junit XML files [integration/config] - 10https://gerrit.wikimedia.org/r/578880 [18:26:10] hashar: You over-wrote my audit. :-( [18:26:27] James_F: on that content translation thing? [18:26:30] oh [18:26:34] sorry [18:26:35] No, on the compress XML one. [18:26:39] I was about to deploy it. [18:26:50] I could not remember whether I had send that patch in Gerrit so I have just git-review -R it [18:26:52] Got distracted by needing to restart jenkins. [18:26:58] * James_F laughs. [18:27:29] while preparing for the contint1001 migration I found out mediawiki junit files are super large [18:27:47] we can probably even just delete them given the junit plugin processes them [18:27:56] but compressed, that should be fine [18:27:58] * James_F nods. [18:28:35] next would be some fresnel jobs that keeps some large trace files [18:28:41] gotta dig into it a bit ;] [18:29:20] (03CR) 10Hashar: [C: 03+2] "Deployed :)" [integration/config] - 10https://gerrit.wikimedia.org/r/578706 (https://phabricator.wikimedia.org/T234181) (owner: 10KartikMistry) [18:40:41] (03CR) 10Hashar: "Better :)" (031 comment) [integration/zuul/deploy] - 10https://gerrit.wikimedia.org/r/577846 (https://phabricator.wikimedia.org/T215458) (owner: 1020after4) [18:45:36] (03PS3) 10Jforrester: Revert "layout: [skin-quibble] Disable PHP74 in gate for now" [integration/config] - 10https://gerrit.wikimedia.org/r/578150 (https://phabricator.wikimedia.org/T247214) (owner: 10Reedy) [18:45:54] (03CR) 10Jforrester: [C: 03+2] "Let's go." [integration/config] - 10https://gerrit.wikimedia.org/r/578150 (https://phabricator.wikimedia.org/T247214) (owner: 10Reedy) [18:46:52] (03Merged) 10jenkins-bot: Revert "layout: [skin-quibble] Disable PHP74 in gate for now" [integration/config] - 10https://gerrit.wikimedia.org/r/578150 (https://phabricator.wikimedia.org/T247214) (owner: 10Reedy) [18:47:31] !log Zuul: Re-enabling PHP74 for all skins T247214 [18:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:47:33] T247214: Re-enable PHP74 in gate for skin-quibble - https://phabricator.wikimedia.org/T247214 [18:55:23] Reedy: Do you know of any extensions that aren't PHP74-compatible? [18:55:38] I'm pondering whether to just throw the switch for everything and wait for bug reports., [18:55:56] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MediaWiki-General, 10PHP 7.4 support, 10Patch-For-Review: Re-enable PHP74 in gate for skin-quibble - https://phabricator.wikimedia.org/T247214 (10Jdforrester-WMF) 05Open→03Resolved a:03Jdforrester-WMF [18:56:01] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-General, 10MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), 10MW-1.35-notes (1.35.0-wmf.10; 2019-12-10), and 3 others: Make MediaWiki core compatible with PHP 7.4 - https://phabricator.wikimedia.org/T233012 (10Jdforrester-WMF) [18:56:19] (03PS4) 10Jforrester: layout: [MediaWiki] Add PHP74 tests as voting for master [integration/config] - 10https://gerrit.wikimedia.org/r/554326 (https://phabricator.wikimedia.org/T233012) [19:20:34] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10Jdforrester-WMF) [19:26:45] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10brennen) Capturing some `#wikimedia-operations` IRC context: ` 13:16 brennen: Sorry, currently in... [19:29:41] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10Jdforrester-WMF) p:05Triage→03Unbreak! [19:36:15] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10brennen) Of course I said "14:00 UTC services window" when I meant 14:00 **local**, because timezones are ha... [19:56:44] Krinkle: Good news! I've found a bug in RL where people passed in a directory, ResourceLoaderFileModule would silently allow it. This breaks in PHP74. Fix: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/579039 [20:00:18] nice catch [20:00:20] * Krinkle joins meeting [20:01:01] thcipriani around? :) [20:01:53] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10DannyS712) >>! In T247446#5961465, @brennen wrote: > Of course I said "14:00 UTC services window" when I mea... [20:21:43] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: CPT want the wmf.23 train blocked for group1 - https://phabricator.wikimedia.org/T247446 (10brennen) 05Open→03Resolved a:03brennen Proceeding. [20:21:45] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: 1.35.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T233871 (10brennen) [21:08:19] 10Gerrit, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Wikimedia-production-error (Shared Build Failure): Jenkins job failing intermittently due to Gerrit HTTP 502 errors when cloning repos - https://phabricator.wikimedia.org/T246763 (10thcipriani) >>! In T246763#5959785, @Lucas_Werkmeister_W... [21:12:35] You can now edit the commit msg as a inline edit in PG - https://gerrit-review.googlesource.com/c/gerrit/+/258492 [21:18:56] 10Release-Engineering-Team (Pipeline), 10Analytics, 10Analytics-Kanban, 10Release Pipeline, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) I've added some debug logging in the eqiad canary pod and got a clue. ` [2020-03-11T20:51:37.42... [21:20:04] paladox: nice! [21:25:21] :) [21:25:54] James_F: Do you know what's going on with https://phabricator.wikimedia.org/T198716 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/450885 ? [21:26:09] What went wrong? [21:28:11] Jdlrobson: Not sure. Also, wrong channel. :-) [21:28:45] do you want me to continue this in operations? i was just wary of noise [21:28:54] Sure, but please continue on the task. [21:29:42] I'll look at it later. [21:36:00] 10Release-Engineering-Team (Kanban), 10MediaWiki-SWAT-deployments, 10User-greg, 10User-zeljkofilipin: Proposal: Effective immediately, disallow multi-sync patch deployment - https://phabricator.wikimedia.org/T187761 (10Krinkle) [21:43:42] 10Gerrit, 10Release-Engineering-Team: Delete gerrit repository wikimedia/security/automated-scanning - https://phabricator.wikimedia.org/T247468 (10sbassett) [21:44:35] 10Gerrit, 10Release-Engineering-Team: Delete gerrit repository wikimedia/security/automated-scanning - https://phabricator.wikimedia.org/T247468 (10sbassett) [21:45:00] 10Gerrit, 10Release-Engineering-Team: Delete gerrit repository wikimedia/security/automated-scanning - https://phabricator.wikimedia.org/T247468 (10sbassett) [21:46:03] 10Gerrit, 10Release-Engineering-Team: Delete gerrit repository wikimedia/security/automated-scanning - https://phabricator.wikimedia.org/T247468 (10sbassett) [21:46:33] 10Gerrit, 10Release-Engineering-Team: Delete gerrit repository wikimedia/security/automated-scanning - https://phabricator.wikimedia.org/T247468 (10Paladox) Yes this can be deleted, though i rather we waited to do that till we upgrade to 2.16. Only because there's a bug which causes spam (fixed with https://gi... [22:13:33] (03CR) 10Jforrester: [C: 03+2] layout: [MediaWiki] Add PHP74 tests as voting for master [integration/config] - 10https://gerrit.wikimedia.org/r/554326 (https://phabricator.wikimedia.org/T233012) (owner: 10Jforrester) [22:14:38] (03Merged) 10jenkins-bot: layout: [MediaWiki] Add PHP74 tests as voting for master [integration/config] - 10https://gerrit.wikimedia.org/r/554326 (https://phabricator.wikimedia.org/T233012) (owner: 10Jforrester) [22:23:52] !log Zuul: [MediaWiki] Add PHP74 tests as voting for master T233012 [22:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:23:55] T233012: Make MediaWiki core compatible with PHP 7.4 - https://phabricator.wikimedia.org/T233012 [22:24:27] whee [22:24:39] It's gonna certainly be the easiest way to find out [22:24:47] If the code is exercised by tests ;p [22:25:18] Reedy: It passed an experimental job. [22:25:26] Reedy: The problem is extensions. I've not enabled that yet. [22:25:31] Wikibase fails. [22:27:06] lol [22:27:23] Specifically, it registers a directory as a styles file in its RL manifest. [22:27:41] Pre PHP74, file_get_contents() just returned false for non-files; now it throws. [22:27:47] lolol [22:28:05] Fix is "simple". [22:34:00] PROBLEM - Free space - all mounts on deployment-snapshot01 is CRITICAL: CRITICAL: deployment-prep.deployment-snapshot01.diskspace._data.byte_percentfree (No valid datapoints found)deployment-prep.deployment-snapshot01.diskspace.root.byte_percentfree (<10.00%) [22:35:59] again [22:36:18] Reedy: Fix is in fact https://gerrit.wikimedia.org/r/c/mediawiki/core/+/579039 plus getting it working enough to actually tell me which bit of Wikibase is broken. [22:37:48] (03PS1) 10Jforrester: layout: [extensions] Add PHP74 tests as voting for master only [integration/config] - 10https://gerrit.wikimedia.org/r/579076 (https://phabricator.wikimedia.org/T233012) [22:37:52] /dev/vda3 20G 15G 4.4G 77% / [22:37:54] It lies [22:38:56] Reedy: Maybe the clean-up script just ran? [22:39:01] RECOVERY - Free space - all mounts on deployment-snapshot01 is OK: OK: deployment-prep.deployment-snapshot01.diskspace._data.byte_percentfree (No valid datapoints found) [22:39:06] lol [22:39:14] Perfect. [22:40:04] "no valid data" = it's fine --Shinken :p [22:44:36] mutante: Sounds about right. [22:47:25] so tedious, I had a cron job over there to just flush out all of the l10n dir [22:47:33] why was that not enough? [22:47:53] maye i should write down all the settings and recreate the instance from scratch >_< [22:51:04] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Order of magnitude for New CI hosting budget - https://phabricator.wikimedia.org/T247320 (10dduvall) p:05Triage→03High [22:58:15] Reedy: Want to Approve https://github.com/wikimedia/less.php/pull/26 so I can do a release? :-) [22:59:06] It's funny, people say GH is easy [22:59:19] It took a little while for me to see that actually reviewing was on another tab [23:00:45] Reedy: It's not easy (indeed, there are lots of things I hate about GH), it's just familiar. [23:01:08] did you managed to approve, Reedy ? [23:01:15] When I found it [23:01:21] well, I closed the GH tab already [23:06:03] * James_F waits, impatiently, for packagist to wake up and see the new tag. [23:08:37] Project beta-scap-eqiad build #291396: 04FAILURE in 4 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291396/ [23:11:41] James_F: If you have the creds you can go on and hit the update button [23:11:57] I see 3.0.0 on https://packagist.org/packages/wikimedia/less.php [23:12:14] Reedy: Aha, thanks. [23:12:27] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) [23:12:53] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) [23:13:36] PROBLEM - Host integration-puppetmaster01 is DOWN: CRITICAL - Host Unreachable (172.16.3.17) [23:13:48] Project beta-scap-eqiad build #291397: 04STILL FAILING in 4 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291397/ [23:14:50] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) [23:15:59] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) Version info: Core at: 1.35.0-alpha (e569e34) Jade at: 0.0.1 (9c7183d) [23:19:23] Project beta-scap-eqiad build #291398: 04STILL FAILING in 4 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291398/ [23:21:09] >23:19:17 23:19:17 sudo -u mwdeploy -n -- /usr/bin/rsync -l deployment-deploy01.deployment-prep.eqiad.wmflabs::common/wikiversions*.{json,php} /srv/mediawiki on deployment-mediawiki-parsoid11.deployment-prep.eqiad.wmflabs returned [255]: ssh: Could not resolve hostname deployment-mediawiki-parsoid11.deployment-prep.eqiad.wmflabs: Name or service not known [23:23:43] Project beta-scap-eqiad build #291399: 04STILL FAILING in 4 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291399/ [23:24:42] I had not heard of this name or service either [23:24:49] deployment-mediawiki-parsoid11 is wrong. [23:24:52] It should be deployment-parsoid11 [23:25:14] Defined in https://gerrit.wikimedia.org/g/operations/puppet/+/863bc33e188665bc147175f922e4825eab5add5d/hieradata/cloud/eqiad1/deployment-prep/common.yaml [23:25:19] Tsk cscott. :-) [23:25:27] * James_F writes a fix. [23:25:29] * bd808 was worried that the dns servers for cloud vps fell over again [23:25:48] bd808: I mean, they might have, but this isn't proof. [23:26:06] fair :) [23:27:09] 10Release-Engineering-Team-TODO, 10Wikimedia-General-or-Unknown, 10Wikimedia-Site-requests, 10MW-1.35-notes (1.35.0-wmf.22; 2020-03-03), 10Patch-For-Review: Try to make wmf-config/wgConf's per-wiki configuration cache redundant - https://phabricator.wikimedia.org/T169821 (10Krinkle) >>! First profile - o... [23:27:34] we had a lot of errors like that yesterday and the day before. a missing feature in OpenStack's integration with PDNS left one of our 2 authoritative servers with empty zone files after a rebuild [23:27:47] but we think it is fixed for now [23:27:48] mutante: Could you sling out https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/579085 ? :-) [23:29:01] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10Patch-For-Review, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) {T199834} renamed the content handler, add the old one for backwards compatib... [23:29:13] Project beta-scap-eqiad build #291400: 04STILL FAILING in 3 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291400/ [23:32:45] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10Patch-For-Review, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10Reedy) Was something done to clean up prod enwiki? Or is your example just new enough it... [23:34:59] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10Patch-For-Review, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) >>! In T247476#5962336, @Reedy wrote: > Was something done to clean up prod e... [23:36:13] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10Patch-For-Review, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10ACraze) Hi @DannyS712, thanks for the bug report. It looks like these pages that fail un... [23:37:52] James_F: yea, though the old one was apparently calle "-mediawiki-" and that could theoretically mean the prefix-puppet stuff does not apply the same way [23:37:53] 10Beta-Cluster-Infrastructure, 10Jade, 10MediaWiki-ContentHandler, 10Patch-For-Review, 10User-DannyS712: Beta cluster: The content model 'JadeJudgment' is not registered - https://phabricator.wikimedia.org/T247476 (10DannyS712) >>! In T247476#5962355, @ACraze wrote: > Hi @DannyS712, thanks for the bug re... [23:38:34] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) 05Open→03Resolved a:05Jdforrester-WMF→03cscott A lot of the final push to get this fixed in Restbase and... [23:38:41] mutante: Yeah, but it works right now, so I'm fine with the name. [23:38:45] James_F: done [23:38:54] mutante: When it breaks in future we can fix it yet another time. :-) [23:38:59] Thank you! [23:39:05] yep, np [23:40:17] Project beta-scap-eqiad build #291401: 04STILL FAILING in 5 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291401/ [23:40:51] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Ryasmeen) Thanks @cscott and @Dzahn! [23:48:48] Project beta-scap-eqiad build #291402: 04STILL FAILING in 4 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291402/ [23:52:56] Project beta-scap-eqiad build #291403: 04STILL FAILING in 4 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291403/ [23:58:36] Project beta-scap-eqiad build #291404: 04STILL FAILING in 4 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/291404/ [23:58:51] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10cscott)