[00:03:06] thcipriani: There's `./utils/docker-updates --apply` to make the jjb step faster. [00:03:07] (03PS1) 10Thcipriani: Upgrade Quibble jobs to 1.18.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 [00:03:12] Ha, oh well. [00:04:54] (03PS2) 10Thcipriani: Upgrade Quibble jobs to 1.18.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 (https://phabricator.wikimedia.org/T420865) [00:05:36] James_F: thanks :) fighting with many scripts because my python is too new [00:05:44] Oh no. [00:06:00] Multiple pythons make things fun. [00:06:28] never too many pythons [00:06:37] (03PS3) 10Jforrester: jjb: Upgrade Quibble jobs to 1.18.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 (https://phabricator.wikimedia.org/T420865) (owner: 10Thcipriani) [00:06:40] (03CR) 10Jforrester: [C:03+1] jjb: Upgrade Quibble jobs to 1.18.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 (https://phabricator.wikimedia.org/T420865) (owner: 10Thcipriani) [00:08:05] overall, this process has been pretty streamlined since the last time I attempted it, it's pretty nice [00:08:27] Yeah. Not quite one-click-release, but much simpler (and documented!). [00:36:38] (03CR) 10Thcipriani: [C:03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 (https://phabricator.wikimedia.org/T420865) (owner: 10Thcipriani) [00:38:19] (03Merged) 10jenkins-bot: jjb: Upgrade Quibble jobs to 1.18.1 [integration/config] - 10https://gerrit.wikimedia.org/r/1296700 (https://phabricator.wikimedia.org/T420865) (owner: 10Thcipriani) [00:43:09] 10Continuous-Integration-Infrastructure, 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, and 2 others: Fetches from Gerrit aborted due to: GnuTLS recv error (-54): Error in the pull function - https://phabricator.wikimedia.org/T420865#11979716 (10thcipriani) >>! In T420865#11979145, @Dzahn... [00:58:15] (03CR) 10Thcipriani: [C:03+2] release: Start new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/1296694 (owner: 10Thcipriani) [01:14:21] (03Merged) 10jenkins-bot: release: Start new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/1296694 (owner: 10Thcipriani) [05:32:18] thcipriani: James_F: well done! :-) [06:07:42] 10Continuous-Integration-Infrastructure, 07ci-test-error: Error: Cannot find module 'node:fs' - https://phabricator.wikimedia.org/T332386#11979940 (10hashar) [06:32:54] 10Gerrit, 06collaboration-services, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: Change puppet-merge git origin to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org - https://phabricator.wikimedia.org/T420184#11979958 (10ABran-WMF) good idea @Dzahn I pinged @MoritzMuehlenhoff on... [06:43:39] (03Abandoned) 10Phedenskog: jjb: don't save castor cache after mwext-codehealth-master-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/1294992 (https://phabricator.wikimedia.org/T427471) (owner: 10Phedenskog) [06:43:54] (03open) 10kineticpelagic: feat(linter): Improve linting by validating examples in nested schemas (T424002) [repos/ci-tools/wikimedia-spectral-ruleset] - 10https://gitlab.wikimedia.org/repos/ci-tools/wikimedia-spectral-ruleset/-/merge_requests/11 [07:25:00] 10GitLab (Project Migration), 06Release-Engineering-Team (Priority Backlog 📥), 06collaboration-services, 10Fundraising analytics stack, and 2 others: Move wikimedia/fundraising/analytics from Gerrit to Gitlab - https://phabricator.wikimedia.org/T391404#11980026 (10ABran-WMF) [07:39:08] (03PS1) 10Bartosz Wójtowicz: inference-services: Add CI stage for cope-b-a4b model. [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 [07:58:29] (03CR) 10Ozge: inference-services: Add LLM generated editing suggestions CI/CD pipelines. [integration/config] - 10https://gerrit.wikimedia.org/r/1296559 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [08:22:45] (03CR) 10Hashar: [C:03+2] "I have deployed the jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/1296559 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [08:25:40] (03Merged) 10jenkins-bot: inference-services: Add LLM generated editing suggestions CI/CD pipelines. [integration/config] - 10https://gerrit.wikimedia.org/r/1296559 (https://phabricator.wikimedia.org/T427794) (owner: 10Ozge) [08:26:39] !log Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1296559 "inference-services: Add LLM generated editing suggestions CI/CD pipelines." # T427794 [08:26:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:26:41] T427794: Editing Suggestions - api - https://phabricator.wikimedia.org/T427794 [08:41:45] (03CR) 10Kevin Bazira: inference-services: Add CI stage for cope-b-a4b model. (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [08:48:37] (03PS2) 10Bartosz Wójtowicz: inference-services: Add CI stage for cope-b-a4b model. [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 [08:49:07] (03CR) 10Bartosz Wójtowicz: inference-services: Add CI stage for cope-b-a4b model. (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [08:56:20] (03CR) 10Kevin Bazira: [C:03+1] inference-services: Add CI stage for cope-b-a4b model. [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [09:02:59] (03PS1) 10Gkyziridis: inference-services: Add liftwing-openapi-server CI/CD pipelines. [integration/config] - 10https://gerrit.wikimedia.org/r/1297070 (https://phabricator.wikimedia.org/T427902) [09:08:27] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Remove "Backend" column from wikis table - https://phabricator.wikimedia.org/T426744#11980451 (10jnuche) [09:23:28] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Use a more recent Helm version to deploy to prod - https://phabricator.wikimedia.org/T400083#11980517 (10jnuche) [09:28:12] 06Project-Admins, 07Tracking-Neverending: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#11980530 (10aputhin) Hello! I'm the EM of the Tools Platform team, responsible for maintaining and evolving #toolforge. I'd like to be added to #acl_project-ad... [09:34:50] 06Project-Admins, 07Tracking-Neverending: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706#11980556 (10Ladsgroup) >>! In T706#11980530, @aputhin wrote: > Hello! I'm the EM of the Tools Platform team, responsible for maintaining and evolving #toolforg... [09:47:21] (03CR) 10Hashar: [C:03+2] "INFO:jenkins_jobs.builder:Number of jobs generated: 2" [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [09:49:43] (03Merged) 10jenkins-bot: inference-services: Add CI stage for cope-b-a4b model. [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [09:51:42] (03CR) 10Hashar: [C:03+2] "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/1296950 (owner: 10Bartosz Wójtowicz) [09:51:56] (03CR) 10Hashar: [C:03+2] "INFO:jenkins_jobs.builder:Number of jobs generated: 4" [integration/config] - 10https://gerrit.wikimedia.org/r/1297070 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [09:55:24] (03Merged) 10jenkins-bot: inference-services: Add liftwing-openapi-server CI/CD pipelines. [integration/config] - 10https://gerrit.wikimedia.org/r/1297070 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [11:36:29] (03open) 10jnuche: definitions/helm/blubber.yaml: add missing package [repos/releng/ci-images] - 10https://gitlab.wikimedia.org/repos/releng/ci-images/-/merge_requests/3 (https://phabricator.wikimedia.org/T400083) [11:37:36] (03approved) 10jnuche: definitions/helm/blubber.yaml: add missing package [repos/releng/ci-images] - 10https://gitlab.wikimedia.org/repos/releng/ci-images/-/merge_requests/3 (https://phabricator.wikimedia.org/T400083) [11:38:16] (03merge) 10jnuche: definitions/helm/blubber.yaml: add missing package [repos/releng/ci-images] - 10https://gitlab.wikimedia.org/repos/releng/ci-images/-/merge_requests/3 (https://phabricator.wikimedia.org/T400083) [12:25:40] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work, 13Patch-For-Review: Use a more recent Helm version to deploy to prod - https://phabricator.wikimedia.org/T400083#11981151 (10jnuche) 05Open→03Resolved Both Patchdemo can CAPI are now using the image + deployin... [12:39:05] 10Beta-Cluster-Infrastructure: haproxy in Beta cluster has invalid config - https://phabricator.wikimedia.org/T428052 (10Urbanecm_WMF) 03NEW [12:42:15] 10Beta-Cluster-Infrastructure: haproxy in Beta cluster has invalid config - https://phabricator.wikimedia.org/T428052#11981214 (10Urbanecm_WMF) Pushed this on beta as a stopgap: ` root@deployment-puppetserver-1:/srv/git/operations/puppet# git diff HEAD~ HEAD diff --git a/modules/profile/templates/cache/haproxy.... [12:47:15] (03PS1) 10Jforrester: Zuul: Add Rae 5e as a trusted user [integration/config] - 10https://gerrit.wikimedia.org/r/1297123 [12:50:34] (03CR) 10Jforrester: [C:03+2] Zuul: Add Rae 5e as a trusted user [integration/config] - 10https://gerrit.wikimedia.org/r/1297123 (owner: 10Jforrester) [12:54:17] (03Merged) 10jenkins-bot: Zuul: Add Rae 5e as a trusted user [integration/config] - 10https://gerrit.wikimedia.org/r/1297123 (owner: 10Jforrester) [12:54:57] !log Zuul: Add Rae 5e as a trusted user [12:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:56:42] 10Beta-Cluster-Infrastructure, 06SRE, 06Traffic: haproxy in Beta cluster has invalid config - https://phabricator.wikimedia.org/T428052#11981250 (10Urbanecm_WMF) [13:42:17] jnuche: Thanks! That looks to have fixed it. [13:44:25] James_F: yep, I think this may become a recurring issue, not just for WL. We should start thinking about exposing these kind of issues to the client, so it becomes obvious from the get-go what's going on [13:44:30] I'll bring it up with the Catalyst team [13:45:19] * James_F nods. [13:49:16] phabricator is having a bad time. already known? [13:50:26] i'm getting various errors about MySQL being unavailable. i can share screenshots if it helps [13:50:27] MatmaRex: it's being DDoS'd, SRE is aware and looking into it [13:50:32] ty [13:52:20] MatmaRex: how's it working now? [13:54:48] 06Release-Engineering-Team (Doing 😎), 06Abstract Wikipedia team, 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work, 13Patch-For-Review: Python evaluator is being OOMKilled in wikilambda-catalyst-end-to-end jobs - https://phabricator.wikimedia.org/T428031#11981382 (10jnuche) [14:16:59] (03merge) 10jnuche: Create trixie image for fundraising ML services [repos/releng/dev-images] - 10https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/107 (owner: 10ejegg) [14:19:47] !log Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/107 [14:19:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:35:53] oh hey, that ML image got published [14:36:05] thanks! [15:28:29] PROBLEM - jenkins_service_running on contint1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:32:37] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:32:56] jhathaway: I am switching the puppet-diffs instances to Java 21, we are currently upgrading Jenkins to Java 21 [15:33:10] I did the hiera change via Horizon [15:33:16] thanks hashar [15:33:33] the puppet server doesn't care about the java version does it? :] [15:34:42] we will see, but I don't think so [15:34:57] puppetserver itself is a java program, but we don't use that for puppet-diffs, to my knowledge [15:35:03] oh [15:35:17] well the puppetserver is not attached as an agent to Jenkins [15:35:45] i should have put you in the loop ahead of time sorry :\ [15:35:52] * hashar runs puppet on pcc-db1002 [15:35:57] I think it should be fine [15:36:05] no worries at all [15:36:33] well I ran puppet and the pcc-db1002 is still having java 17 [15:36:39] I guess the hiera change does not apply to it [15:36:46] the other instances pcc-workerXXX are now having java 21 [15:36:54] so I think we are set [15:37:29] RECOVERY - jenkins_service_running on contint1002 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:45:29] great thanks hashar [15:47:37] RECOVERY - jenkins_service_running on contint1003 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:52:29] PROBLEM - jenkins_service_running on contint1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [16:04:29] RECOVERY - jenkins_service_running on contint1002 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [16:07:29] PROBLEM - jenkins_service_running on contint1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [16:19:29] Project mediawiki-core-doxygen build #20988: 04FAILURE in 32 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20988/ [16:33:24] Is jenkins having a hard time at the moment? [16:35:20] It looks like the apache2 service has failed on contint1002 due to an SSL related syntax error. [16:35:29] RECOVERY - jenkins_service_running on contint1002 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [16:37:26] btullis: mutante was doing a changover, which I think is being reverted [16:37:28] Working now, thanks. [16:42:16] we had to revert just now [16:42:24] something did not work as planned [16:42:31] gave up and back to before now [16:51:46] Project mediawiki-core-doxygen build #20989: 04STILL FAILING in 32 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20989/ [16:51:46] Project Cognate-phpmetrics build #62: 04FAILURE in 26 min: https://integration.wikimedia.org/ci/job/Cognate-phpmetrics/62/ [16:53:39] 10Beta-Cluster-Infrastructure, 06SRE, 06Traffic: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword - https://phabricator.wikimedia.org/T428052#11982409 (10ssingh) >>! In T428052#11982183, @bd808 wrote: > The Beta Cluster cache nodes are Debian Bullseye running HAProxy version 2.8.18-... [16:56:26] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: setup 2 contint machines for jenkins - https://phabricator.wikimedia.org/T418521#11982428 (10hashar) The message we got after having Apache mod_proxy changed to point to the https discovery.wm... [17:11:06] Project mediawiki-core-doxygen build #20990: 04STILL FAILING in 19 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20990/ [17:21:50] mutante: thcipriani: the CI Jenkins is working [17:22:03] having the controler with java 17 while agents are on Java 21 does work [17:22:14] as long as groovy/java object serialization is not involved [17:22:21] I think [17:22:40] the issues we keep seeing when we upgrade Java is for the PipelineLib jobs [17:22:51] and they need the exact same java version or things explodes [17:23:10] so essentially things are happy now [17:43:39] Project mediawiki-core-doxygen build #20991: 04STILL FAILING in 32 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20991/ [17:48:18] * hashar checks [17:48:45] ehe that one is broken [17:48:48] https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ [17:49:03] last build that shows up is 20988 [17:49:32] some id is off by one [17:49:49] https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20989/console [17:49:50] ... [17:51:39] OH it is running on contint1003 [17:51:55] mutante: we would need to stop jenkins on contint1003 and ensure puppet does not bring it back [17:52:34] in -operations: ¡log contint1003: sudo puppet agent --disable "Prevent Jenkins from coming back" [17:53:27] cause the Jenkins on the new host has the config which includes all the jobs that run on a timer [17:53:37] PROBLEM - jenkins_service_running on contint1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [18:03:00] ^ me I have stopped it to prevent timed jobs to run from contint1003 given Jenkins is on contint1002 [18:03:52] Yippee, build fixed! [18:03:52] Project mediawiki-core-doxygen build #20989: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/20989/ [19:07:20] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: setup 2 contint machines for jenkins - https://phabricator.wikimedia.org/T418521#11982876 (10Dzahn) New patch uploaded. This adds `SSLProxyEngine on` to turn on proxy-ing over https and also c... [20:57:31] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate Quibble images from bullseye to bookworm - https://phabricator.wikimedia.org/T362705#11983136 (10Jdforrester-WMF) [21:29:20] (03PS8) 10Jforrester: jjb: [wikilambda-catalyst-end-to-end-daily] Email on daily failure/unstable only [integration/config] - 10https://gerrit.wikimedia.org/r/1294369 (https://phabricator.wikimedia.org/T424411) (owner: 10Vaughn Walters) [21:29:34] (03CR) 10Jforrester: [C:03+2] "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/1294369 (https://phabricator.wikimedia.org/T424411) (owner: 10Vaughn Walters) [21:30:48] (03CR) 10Jforrester: [C:03+2] jjb: [catalyst-daily-TwoColConflict] Add TwoColConflict job [integration/config] - 10https://gerrit.wikimedia.org/r/1294428 (https://phabricator.wikimedia.org/T427012) (owner: 10Vaughn Walters) [21:31:12] (03Merged) 10jenkins-bot: jjb: [wikilambda-catalyst-end-to-end-daily] Email on daily failure/unstable only [integration/config] - 10https://gerrit.wikimedia.org/r/1294369 (https://phabricator.wikimedia.org/T424411) (owner: 10Vaughn Walters) [21:33:18] (03Merged) 10jenkins-bot: jjb: [catalyst-daily-TwoColConflict] Add TwoColConflict job [integration/config] - 10https://gerrit.wikimedia.org/r/1294428 (https://phabricator.wikimedia.org/T427012) (owner: 10Vaughn Walters) [21:38:01] Yippee, build fixed! [21:38:02] Project mwcore-phpunit-coverage-master build #5168: 09FIXED in 38 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/5168/ [23:02:03] tomorrow's train log triage (non eu edition) conflicts with staff meeting, oughta get cancelled I guess? [23:11:50] (03CR) 10Rae: "Thank you very much!" [integration/config] - 10https://gerrit.wikimedia.org/r/1297123 (owner: 10Jforrester)