[00:35:11] 10Continuous-Integration-Infrastructure: Setup CI build stats reporting tool to produce ongoing reports on capacity and utilization - https://phabricator.wikimedia.org/T376830#10227386 (10Krinkle) See also {T255701}. [00:52:22] (03PS3) 10Ebomani: Updating Patch Demo plugin to return legacy/new URL as needed [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079624 (https://phabricator.wikimedia.org/T374954) [00:55:00] (03CR) 10Ebomani: "Hello Antoine, here are the changes to the plugin to address the missing PatchDemo links issue (https://phabricator.wikimedia.org/T374954)" [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079624 (https://phabricator.wikimedia.org/T374954) (owner: 10Ebomani) [03:15:37] FIRING: DatasourceError: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [03:20:37] RESOLVED: DatasourceError: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [05:33:38] 10Phabricator: Rename my account to Ratekreel - https://phabricator.wikimedia.org/T377173 (10Baggaet) 03NEW [05:34:33] 10Phabricator: Rename my account to Ratekreel - https://phabricator.wikimedia.org/T377173#10227556 (10Baggaet) p:05Triageβ†’03Low [07:43:10] 10Gerrit (Gerrit 3.10): Upgrade to Gerrit 3.10.2 - https://phabricator.wikimedia.org/T373897#10227709 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7b6654f9-afeb-4009-abb2-952905cb7c01) set by jelto@cumin1002 for 1:00:00 on 3 host(s) and their services with reason: Gerrit 3.10.2 update ` ge... [07:49:14] 10Gerrit (Gerrit 3.10): Upgrade to Gerrit 3.10.2 - https://phabricator.wikimedia.org/T373897#10227732 (10hashar) 05Openβ†’03Resolved a:03hashar I have upgraded Gerrit on all three hosts `gerrit1003.wikimedia.org`, `gerrit2002.wikimedia.org` and `gerrit2003.wikimedia.org`. [07:57:01] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 06cloud-services-team, 10Cloud-VPS, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10227770 (10hashar) > fatal:... [08:10:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 06cloud-services-team, 10Cloud-VPS, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10227787 (10hashar) I went t... [08:21:12] 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review, 07Upstream: Enable visual image differencein Gerrit (was Install gerrit image-diff plugin) - https://phabricator.wikimedia.org/T341291#10227812 (10hashar) a:05hasharβ†’03None The patches I wrote got reverted by upstream. I was upgrading Resemble.... [08:27:09] 10Gerrit (Gerrit 3.10): Configure Gerrit to use conflictStyle diff3 - https://phabricator.wikimedia.org/T359821#10227845 (10hashar) I have upgraded our Gerrit to 3.10.2 which does include //[[ https://gerrit-review.googlesource.com/c/gerrit/+/431417 | 431417 - Add support for using diff3 for rebasing and cherry-... [08:31:20] I've just done a "before" test for the 19 extensions mentioned in https://phabricator.wikimedia.org/T50217 (parallel testing), and have all greens except for a couple of selenium jobs (which don't have anything to do with PHPUnit parallel testing) [08:31:27] 06Project-Admins, 07Tracking-Neverending: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#10227868 (10Gehel) Hey! Could you please add @BTullis to Project Admins, so that he can create milestones in the #data-platform-sre project? He is a taking ove... [08:31:47] my plan is to go ahead and merge the patch from kostajh , then deploy zuul and re-run the tests for all 19 extensions [08:31:56] sound good hashar ? [08:32:35] codders: +1 :) [08:32:58] given I will have lunch early today [08:32:58] codders: let's go! [08:32:58] nice. then here goes! [08:33:05] but you should be all set to revert if need be [08:33:57] (03CR) 10Arthur taylor: [C:03+2] "Have those checks lined up now. Will merge this then redeploy Zuul and re-run the tests (T377176)" [integration/config] - 10https://gerrit.wikimedia.org/r/1076148 (https://phabricator.wikimedia.org/T50217) (owner: 10Kosta Harlan) [08:34:47] yup. ready for that if we need to [08:35:00] and you can always do a progressive deploy :] [08:35:13] aka one repo after each other, but that is long and tedious :/ [08:35:42] (03Merged) 10jenkins-bot: zuul: Re-enable parallel PHPUnit for 7.4 jobs [integration/config] - 10https://gerrit.wikimedia.org/r/1076148 (https://phabricator.wikimedia.org/T50217) (owner: 10Kosta Harlan) [08:35:46] yeah. I thought about that. I think I will try this one-shot, and if we end up rolling back then I'll do the one-by-one [08:36:32] there is also a /usr/local/bin/zuul-test-repo script on the contint machine [08:36:55] what does that do? [08:36:59] that enqueue the latest change of a given repo in Zuul [08:37:18] the same way it would if one comments `recheck` or existing changes [08:37:24] but I have never used it :D [08:37:32] ah. okay. well, maybe not this morning then :) [08:37:35] hehe [08:39:55] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/1076148 (T50217, T377176) [08:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:39:59] T50217: Speed up MediaWiki PHPUnit build by running integration tests in parallel - https://phabricator.wikimedia.org/T50217 [08:39:59] T377176: Re-enable parallel PHPUnit for 7.4 jobs - https://phabricator.wikimedia.org/T377176 [08:42:15] okay. let's see what we get [08:47:48] SUCCESS and errors !! [08:48:46] :) [08:49:47] which errors do you see? So far I only see selenium errors [08:50:02] hashar: https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=integration&var-instance=All&from=now-1h&to=now&viewPanel=7 is OK or concerning? [08:50:26] heh. I did launch a lot of tests [08:50:58] we said anything about 200 was probably grinding CPUs rather than speeding anything up, right? [08:53:25] now we're back in normal territory [08:53:47] you'd need to check the load solely for the integration-agent-docker and against the number of CPUs [08:54:15] the project wide aggregate does not give much info as to whether it is saturated [08:55:27] the machine waits if `(Load / num CPU) > 1` [08:55:55] but there are also other sources of waits than just waiting for CPU. IO (disk/network) are another source [08:57:23] ah /proc/pressure/cpu and https://docs.kernel.org/accounting/psi.html [08:57:41] those are probably collected via Prometheus and, if so, graphable [08:58:22] (there are also io and memory data) [08:58:37] interesting! [08:59:38] k. I got an error on WikibaseLexeme now, which might be a real problem, but also something WMDE can look at today [08:59:41] https://phabricator.wikimedia.org/P69904 [09:00:55] 10Beta-Cluster-Infrastructure, 06Growth-Team: Timeout for MediaWiki jobs in beta is lower than in production - https://phabricator.wikimedia.org/T377180 (10Urbanecm_WMF) 03NEW [09:01:53] doesn't look so bad, if those are %s [09:01:56] and from one of the host, it is mostly busy due to Cypress (from WikibaseLexeme apparently) + ffmpeg [09:02:04] add chromium of course :] [09:02:19] codders: Wikibase has also a failure, from a CentralAuth test [09:03:12] yup. looks the same as the Lexeme failure [09:03:27] probably something we did there recently. But I'll let WMDE know and we can take a look [09:04:10] growth experiments has a failure on a non-voting job, but that seems unrelated [09:11:50] besides those Wikibase issues, it seems to be green across the board (which is a bit ironic, since the WMDE repos were running in parallel for months without issue). [09:12:17] @hashar ah. yeah. We might have enabled parallel cypress testing for WikibaseLexeme :) [09:24:55] The growth experiments phpbench job is unrelated I think. [09:30:26] :-] [10:40:25] 06Project-Admins, 07Tracking-Neverending: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#10228460 (10Ladsgroup) >>! In T706#10227867, @Gehel wrote: > Hey! Could you please add @BTullis to Project Admins, so that he can create milestones in the #dat... [11:12:50] 10Phabricator: Rename my account to Ratekreel - https://phabricator.wikimedia.org/T377173#10228534 (10Aklapper) 05Openβ†’03Resolved a:03Aklapper Done! [11:39:26] codders: how's it looking? [11:39:49] really good. Have a patch for the Wikibase / WikibaseLexeme issue and that's been merged [11:40:01] just waiting for the two DNMs to go green [11:40:14] nice [11:43:30] greens across the board! and so far no other reports of issues [11:54:59] 06Release-Engineering-Team, 10MediaWiki-Core-Tests, 10wmde-wikidata-tech, 07Developer Productivity, and 2 others: Create a daily job running core + extension PHPUnit tests serially - https://phabricator.wikimedia.org/T372618#10228655 (10ArthurTaylor) a:03ArthurTaylor [12:19:43] I'm heading off for today - it's looking pretty quiet, but if it blows up I guess we know where the revert button is [12:35:54] (03CR) 10Hashar: "Excellent!!" [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079624 (https://phabricator.wikimedia.org/T374954) (owner: 10Ebomani) [12:42:51] (03CR) 10Hashar: "WMDE is using the Quibble for reproducing CI failures and end up doing debugging. It looks like T319495 mass removed XDebug after we migra" [integration/config] - 10https://gerrit.wikimedia.org/r/1079250 (https://phabricator.wikimedia.org/T319495) (owner: 10Arthur taylor) [13:04:12] (03PS3) 10Hashar: jjb: change tox-publish jobs to use teardown [integration/config] - 10https://gerrit.wikimedia.org/r/1078444 [13:04:21] (03CR) 10Hashar: [C:03+2] "Tested on https://gerrit.wikimedia.org/r/c/pywikibot/core/+/1080031" [integration/config] - 10https://gerrit.wikimedia.org/r/1078444 (owner: 10Hashar) [13:05:36] (03Merged) 10jenkins-bot: jjb: change tox-publish jobs to use teardown [integration/config] - 10https://gerrit.wikimedia.org/r/1078444 (owner: 10Hashar) [13:09:34] 06Project-Admins, 07Tracking-Neverending: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#10228953 (10BTullis) Access confirmed. Many thanks @Ladsgroup [13:16:14] (03PS5) 10Hashar: jjb: reorder publishers to use "teardown" publisher [integration/config] - 10https://gerrit.wikimedia.org/r/1075862 [13:16:14] (03CR) 10Hashar: [C:03+2] "Jobs updated:" [integration/config] - 10https://gerrit.wikimedia.org/r/1075862 (owner: 10Hashar) [13:18:10] (03Merged) 10jenkins-bot: jjb: reorder publishers to use "teardown" publisher [integration/config] - 10https://gerrit.wikimedia.org/r/1075862 (owner: 10Hashar) [13:36:10] (03Abandoned) 10Hashar: Codex-PHP: Remove the postmerge tasks for now [integration/config] - 10https://gerrit.wikimedia.org/r/1078493 (https://phabricator.wikimedia.org/T373940) (owner: 10Eric Gardner) [13:48:16] OH DISK FULL [13:48:18] of course [13:48:23] out of inodes! not space :) [13:49:31] and I have misread some graph this morning https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=integration&var-instance=All&viewPanel=512 [13:49:36] which shows the disk FREE percentage [13:49:43] when I had read it as a USAGE percentage [13:49:49] and though that we ran out of inodes [13:49:52] when we have plenty [13:49:53] .. [15:21:56] codders: just filed T377234 which looks suspicious [15:21:57] T377234: AutoLoaderStructureTest::testAutoloadOrder - autoload.php does not match output of generateLocalAutoload.php script - https://phabricator.wikimedia.org/T377234 [15:27:10] kostajh: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1077357/16/autoload.php [15:27:12] The MW core patch is wrong [15:27:39] ah, I missed that. Thanks Reedy [15:27:39] 10Continuous-Integration-Config, 10MediaWiki-Vendor, 07Composer, 10MW-1.43-notes (1.43.0-wmf.28; 2024-10-22), and 3 others: Upgrade composer to 2.8.x - https://phabricator.wikimedia.org/T376409#10229814 (10Jdforrester-WMF) [15:28:04] yay depenancy trees :D [15:28:19] :) [15:28:36] T377197 does seem like a real problem though [15:28:49] "real" = caused by the phpunit parallel execution [15:28:53] T377197: SpecialCentralAuthTest fails when run in a suite with AccountCreationDetailsLookupTest - https://phabricator.wikimedia.org/T377197 [15:35:41] Chances are CI is going to be complaining and DB updates failing while we land a few mw core and vendor patches [15:50:04] that;s an interesting one, bookmarked for later reading [16:20:01] Project beta-update-databases-eqiad build #79647: 04FAILURE in 0.47 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/79647/ [16:23:10] 10Continuous-Integration-Infrastructure: Setup CI build stats reporting tool to produce ongoing reports on capacity and utilization - https://phabricator.wikimedia.org/T376830#10230143 (10bd808) https://jenkins-build-stats.toolforge.org/ is now setup as a redirect to https://tools-static.wmflabs.org/jenkins-buil... [16:23:43] 06Release-Engineering-Team, 10Scap: scap deploy --init fails if the deployment server is not primary for mediawiki deployments - https://phabricator.wikimedia.org/T376995#10230148 (10jnuche) a:03jnuche [17:18:08] 10Deployments, 10Release-Engineering-Team (Priority Backlog πŸ“₯), 07Kubernetes, 13Patch-For-Review: Build and publish multiple MediaWiki production images for a given set of PHP versions - https://phabricator.wikimedia.org/T370934#10230469 (10Scott_French) No issues encountered during testing after updating... [17:25:47] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Scap, 10Codex, 06Design-System-Team, 07Epic: [Spike/Timebox] DST Support for Codex-ifying SpiderPig UI - https://phabricator.wikimedia.org/T376932#10230499 (10bmartinezcalvo) [17:26:03] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Scap, 10Codex, 06Design-System-Team, 07Epic: [Spike/Timebox] DST Support for Codex-ifying SpiderPig UI - https://phabricator.wikimedia.org/T376932#10230501 (10CCiufo-WMF) p:05Triageβ†’03Medium [17:31:57] Project beta-update-databases-eqiad build #79648: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/79648/ [17:32:26] yes we know [17:48:53] 10Phabricator, 10Tool-ldap: https://ldap.toolforge.org/ integration assumes that `cn` and `uid` are equivalent - https://phabricator.wikimedia.org/T376769#10230611 (10Legoktm) >>! In T376769#10223261, @matmarex wrote: > Hmm, they’re not even 404s, but 500s for me. Both the space and the β€˜Ε„β€™ seem to cause that.... [18:09:31] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Scap, 10Codex, 10Design-System-Team (DST-Sprint-34 (2024-10-15 to 2024-10-25)), 07Epic: [Spike/Timebox] DST Support for Codex-ifying SpiderPig UI - https://phabricator.wikimedia.org/T376932#10230689 (10bmartinezcalvo) [18:32:19] Yippee, build fixed! [18:32:19] Project beta-update-databases-eqiad build #79649: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/79649/ [19:06:29] (03PS1) 10Zoranzoki21: Zuul: Archive the StickyTOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/1080362 (https://phabricator.wikimedia.org/T374778) [20:28:05] 10Phabricator, 10Tool-ldap: https://ldap.toolforge.org/ integration assumes that `cn` and `uid` are equivalent - https://phabricator.wikimedia.org/T376769#10231278 (10bd808) >>! In T376769#10230611, @Legoktm wrote: > I'll spend a bit of time figuring out actual LDAP query escaping instead of using a too-restri... [20:59:04] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Scap, 10Codex, 10Design-System-Team (DST-Sprint-34 (2024-10-15 to 2024-10-25)), 07Epic: [Spike/Timebox] DST Support for Codex-ifying SpiderPig UI - https://phabricator.wikimedia.org/T376932#10231463 (10egardner) a:03egardner [21:59:30] (03CR) 10Jforrester: [C:03+2] Zuul: Archive the StickyTOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/1080362 (https://phabricator.wikimedia.org/T374778) (owner: 10Zoranzoki21) [22:01:05] (03Merged) 10jenkins-bot: Zuul: Archive the StickyTOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/1080362 (https://phabricator.wikimedia.org/T374778) (owner: 10Zoranzoki21) [22:01:59] !log Zuul: Archive the StickyTOC extension, for T374778 [22:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:02:02] T374778: Archive the StickyTOC extension - https://phabricator.wikimedia.org/T374778 [22:15:34] 10MediaWiki-Releasing, 07Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#10231767 (10Reedy) [23:34:27] 06Project-Admins: umbrella project for Wikimedia Ukraine - https://phabricator.wikimedia.org/T374503#10231961 (10Base) @Aklapper , added the new project to the task. I am utterly confused by this whole change though. So it seems that the old tasks are in the old project. The open ones are in the new one. Th...