[06:15:27] 06Release-Engineering-Team, 06DBA, 10MediaWiki-Core-Revision-backend, 06MediaWiki-Engineering, 07Wikimedia-production-error: MediaWiki\Revision\RevisionAccessException: Failed to load data blob from {address} for revision {revision}. If this problem pers... - https://phabricator.wikimedia.org/T373668#10109229 [08:15:26] 06Release-Engineering-Team, 06DBA, 10MediaWiki-Core-Revision-backend, 06MediaWiki-Engineering, 07Wikimedia-production-error: MediaWiki\Revision\RevisionAccessException: Failed to load data blob from {address} for revision {revision}. If this problem pers... - https://phabricator.wikimedia.org/T373668#10109411 [08:58:47] (03update) 10jnuche: make-release: Use the same credentials for Gerrit API and Git operations [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/111 (https://phabricator.wikimedia.org/T373441) (owner: 10dduvall) [09:57:37] (03open) 10hashar: jenkins-rel: do not set keepUndefinedParameters [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/73 (https://phabricator.wikimedia.org/T133737) [10:01:55] memories of when it took me a full day to change a `continue` statement to a `break` one inside a while loop https://phabricator.wikimedia.org/T271683#6737850 [10:02:24] and a month to have it deployed to prod :D [10:07:54] FIRING: Queue (Jenkins jobs + Zuul functions) alert: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [10:12:48] 10Continuous-Integration-Infrastructure, 07Jenkins, 13Patch-For-Review: Verify Jenkins Gearman plugin works with Java 17 - https://phabricator.wikimedia.org/T373351#10109757 (10hashar) I have positively tested the Gearman plugin under Java 17. I have started the C language Gearman server (`sudo apt install... [10:13:00] 10Continuous-Integration-Infrastructure, 07Jenkins, 13Patch-For-Review: Verify Jenkins Gearman plugin works with Java 17 - https://phabricator.wikimedia.org/T373351#10109759 (10hashar) I am keeping this open since I'd like to polish up: * gearman-java to be buildable under Java 17 * The Gearman plugin https:... [10:17:58] PROBLEM - Work requests waiting in Zuul Gearman server on contint1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [10:25:54] Zuul does not appear to be removing the old versions of patches I've rebased from the queue. [10:26:26] I uploaded a batch of patches (around 20), and then rebased them all to the master branch [10:27:07] All of the old versions still appear in the test queue [10:27:54] RESOLVED: Queue (Jenkins jobs + Zuul functions) alert: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [10:27:55] But nothing seems to be processing too [10:35:55] hashar: Is there a way to remove the old versions of the patches manually? [10:36:05] I can't seem to abandon the jobs unless they have started. [10:37:30] Dreamy_Jazz: let me check [10:38:00] ah yeah you essentially broke CI [10:38:06] :( [10:38:30] I am not sure how you have sent 20 patches since we should have a limit much lower than that set in Gerrit [10:38:33] but maybe that is broken [10:38:38] anyway [10:38:38] The limit was 20 [10:38:57] I guess someone changed it, it used to be lower [10:38:58] anyway [10:39:12] CI is busy processing your patches, that can be seen in the graphs at the bottom of https://integration.wikimedia.org/zuul/ [10:39:21] notably the graph "Gearman job queue" [10:39:38] which once clicked sends you to https://grafana.wikimedia.org/d/000000322/zuul-gearman?viewPanel=10&from=now-24h&to=now&orgId=1 [10:39:40] It's just nothing in the test queue is appearing as starting [10:39:50] yes that is normal [10:39:56] I was watching that graph slowly drop [10:39:59] CI is busy preparing the merge commits [10:40:51] the workaround is to split your series of patches :) [10:41:02] Yeah... [10:41:15] or make bigger commits maybe [10:41:41] but most probably, I would just break the series, it is too large [10:41:56] I was creating commits from my uncommitted local changes [10:42:06] So I had just gone commit followed by commit [10:42:07] yeah I imagine :] [10:43:00] and if you add small test coverages to the various maintenance scripts, maybe they can be grouped in a single commit [10:44:28] Is Gearman still processing the old versions of the patches? [10:45:17] I had thought that CI de-duplicates the jobs, removing the old ones on rebases etc. [10:47:56] hi, gerrit wont run any CI checks for this patch https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1069987 [10:48:09] Yep. My fault [10:48:20] I uploaded a load of patches in one go [10:48:52] Caused Gearman to have a lot to do: https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&from=now-6h&to=now [10:50:18] is there anything we can do other than wait? [10:50:34] I can't abandon the test jobs because I need them to start to abandon them [10:51:46] Apparently https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Debugging says that being able to drop jobs from gearman is currently broken [10:56:49] Dreamy_Jazz: you can split the huge series [10:56:59] cherry-pick changes against master, which can be done from the UI [10:57:25] AFAIK the series is split already [10:57:29] that should drop the changes from the CI/Zuul queue and dispose the enqueued jobs [10:57:45] I did that soon after I uploaded them [10:58:35] Which is why I am unsure why they still appear there [10:58:45] Because the version 1 is old [10:58:52] And version 2 is already waiting in the queue too [11:02:29] yeah it will catch up eventually [11:03:12] Looks like gate-and-submit has started again [11:03:45] hmm [11:04:13] the zuul-merger on contint2002 hasnt' processed anything since 2024-06-04 19:36:57,363 [11:04:50] That probably won't be caused by me :D [11:04:58] Active: active (running) since Tue 2024-06-04 19:34:44 UTC; 2 months 28 days ago [11:05:11] -rw-r--r-- 1 zuul zuul 4910999 Jun 4 19:36 merger-debug.log [11:05:54] !log Restarted zuul-merger on contint2002: it had not been processing events since June 4th!!! [11:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:06:01] that would make CI twice faster [11:06:27] Thanks. [11:07:06] why was it stall? I have no clue [11:15:28] Jobs for the version 1s of the changes are starting [11:15:56] And are not being aborted [11:16:32] I can manually abort them to speed up the queue [11:18:39] there is something sketchy I think [11:18:42] 132 ::ffff:208.80.153.39 Zuul Merger : merger:merge merger:update [11:18:51] that keeps showing and disappearing [11:18:57] so I suspect the connection does not work [11:19:06] or at least it is not kept established which is problematic [11:19:11] or I misunderstand something [11:19:16] I am going to have lunch [11:20:28] Thanks for the help in resolving this [11:23:28] above is a red hearing, that is the output being buffered and cutting the few last lines [11:23:32] yet another bug to file grrr [11:25:30] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Update to Phorge upstream 2024.35 release - https://phabricator.wikimedia.org/T370266#10109882 (10Aklapper) [11:26:42] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Update to Phorge upstream 2024.35 release - https://phabricator.wikimedia.org/T370266#10109886 (10Aklapper) [11:28:02] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Update to Phorge upstream 2024.35 release - https://phabricator.wikimedia.org/T370266#10109899 (10Aklapper) 05Stalled→03Open p:05Low→03Medium [11:53:11] RECOVERY - Work requests waiting in Zuul Gearman server on contint1002 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [12:07:20] Dreamy_Jazz: ^ resolved !:) [12:07:30] :D [12:22:56] 10Continuous-Integration-Infrastructure, 06Data-Platform-SRE, 13Patch-For-Review, 07Upstream: Archiva release repository yields 404 for release repository breaking builds - https://phabricator.wikimedia.org/T373352#10110086 (10hashar) I would need `org.wikimedia.gearman:gearman-java` version `0.10` to be p... [12:34:59] [12:56:54] 10Continuous-Integration-Infrastructure, 06Data-Platform-SRE, 13Patch-For-Review, 07Upstream: Archiva release repository yields 404 for release repository breaking builds - https://phabricator.wikimedia.org/T373352#10110235 (10Gehel) Bundle has been uploaded to Maven Central and is available: https://repo.... [13:39:22] 10GitLab (Integrations), 10Release-Engineering-Team (Radar), 06collaboration-services, 06Infrastructure-Foundations, 06serviceops: Container image reports in debmonitor are broken - https://phabricator.wikimedia.org/T348876#10110426 (10MoritzMuehlenhoff) [13:40:41] 10Continuous-Integration-Infrastructure, 06Data-Platform-SRE, 13Patch-For-Review, 07Upstream: Archiva release repository yields 404 for release repository breaking builds - https://phabricator.wikimedia.org/T373352#10110415 (10hashar) 05Open→03Resolved a:03hashar That solved the issue I was encou... [13:55:28] (03CR) 10Kosta Harlan: "I think we would just not enable parallel PHPUnit on the release branches." [integration/quibble] - 10https://gerrit.wikimedia.org/r/1039669 (https://phabricator.wikimedia.org/T365976) (owner: 10Arthur taylor) [17:19:30] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Update to Phorge upstream 2024.35 release - https://phabricator.wikimedia.org/T370266#10111069 (10Aklapper) DB upgrade may take a while: ` MariaDB [phabricator_file]> SELECT COUNT(*) FROM file; +----------+ | COUNT(*) | +----------+ | 547777 | +... [17:20:01] Project beta-update-databases-eqiad build #78616: 04FAILURE in 0.46 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/78616/ [17:26:33] Composer.lock for parsoid [17:54:44] things are broken because https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1070051 was merged without https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1070048 [17:55:04] so CI is failing everywhere complaining about composer.lock problems [18:20:01] Project beta-update-databases-eqiad build #78617: 04STILL FAILING in 0.55 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/78617/ [18:53:59] 10Release-Engineering-Team (Priority Backlog 📥), 10wikimedia.biterg.io: Sort out how to pull data (affiliations etc) from Bitergia DB via SortingHat API to find needed data updates - https://phabricator.wikimedia.org/T360762#10111177 (10Aklapper) //Should move into separate tasks, however still posting to expl... [19:32:24] Yippee, build fixed! [19:32:24] Project beta-update-databases-eqiad build #78618: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/78618/ [22:51:00] (03open) 10jhuneidi: patchdemo: make and chown /var/www/.cache dir [repos/releng/dev-images] - 10https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/65 (https://phabricator.wikimedia.org/T373721) [22:51:48] (03update) 10jhuneidi: patchdemo: make and chown /var/www/.cache dir [repos/releng/dev-images] - 10https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/65 (https://phabricator.wikimedia.org/T373721)