[00:10:50] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10GitLab: Remove Speed & Function blockers for GitLab work - https://phabricator.wikimedia.org/T274458 (10thcipriani) [00:16:26] (03CR) 10DannyS712: Add a new StaticClosureSniff (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/662110 (https://phabricator.wikimedia.org/T274038) (owner: 10Umherirrender) [00:23:47] (03CR) 10Krinkle: [C: 03+2] internal: Use static closures [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/663321 (owner: 10Umherirrender) [00:25:46] (03Merged) 10jenkins-bot: internal: Use static closures [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/663321 (owner: 10Umherirrender) [00:29:52] 10Release-Engineering-Team, 10Scap, 10MediaWiki-Internationalization, 10Performance-Team, 10Patch-For-Review: Use static php array files for l10n cache at WMF (instead of CDB) - https://phabricator.wikimedia.org/T99740 (10Krinkle) a:03Krinkle [00:44:19] (03PS1) 10Nikki Nikkhoui: Remove publish step [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) [01:00:55] (03CR) 10DannyS712: Remove publish step (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) (owner: 10Nikki Nikkhoui) [01:02:43] (03PS2) 10Nikki Nikkhoui: Remove image-suggestion-api publish step [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) [01:02:53] (03CR) 10Nikki Nikkhoui: Remove image-suggestion-api publish step (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) (owner: 10Nikki Nikkhoui) [01:08:13] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Regenerate a gpg key for Antoine and add to releng-secrets - https://phabricator.wikimedia.org/T274277 (10thcipriani) 05Open→03Resolved Did a keysigning with Antoine 2021-02-10 and re-added to releng-secrets repo! Should still have a keysigning... [01:09:39] (03CR) 10Reedy: [C: 03+2] Remove image-suggestion-api publish step [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) (owner: 10Nikki Nikkhoui) [01:11:25] (03Merged) 10jenkins-bot: Remove image-suggestion-api publish step [integration/config] - 10https://gerrit.wikimedia.org/r/663339 (https://phabricator.wikimedia.org/T273225) (owner: 10Nikki Nikkhoui) [01:22:06] (03PS1) 10BryanDavis: python: upgrade pip before installing requirements [blubber] - 10https://gerrit.wikimedia.org/r/663348 (https://phabricator.wikimedia.org/T274435) [01:24:22] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/663339 [01:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:34:09] (03PS1) 10Zoranzoki21: Archive the GoogleAppEngine extension [integration/config] - 10https://gerrit.wikimedia.org/r/663352 (https://phabricator.wikimedia.org/T274069) [01:37:36] (03PS1) 10Zoranzoki21: Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 [01:38:39] (03CR) 10jerkins-bot: [V: 04-1] Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [01:39:18] (03PS2) 10Zoranzoki21: Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 [01:40:26] (03CR) 10jerkins-bot: [V: 04-1] Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [01:47:11] (03PS3) 10Zoranzoki21: Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 [01:55:33] how long does it take jenkins to update the beta cluster after something merges? [01:58:20] DannyS712: The scap job runs every 10 minutes for code changes, and is triggered for config changes. [01:58:33] DannyS712: So worst-case, if things are working, is about 15 mins. [02:07:58] I assume "beta-code-update-eqiad" is the job to focus on? [02:08:16] is it possible to manually poke? [02:09:06] hmm, actually testing the code that I was hoping would be deployed, it seems to have been deployed and is running, so perhaps Special:Version is cached and out of date? [02:16:14] Probably [02:16:16] There's bugs about it [03:54:17] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) **As of today all jobrunners/videoscalers across eqiad and codfw are all 100... [03:54:59] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [04:20:22] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.31 deployment blockers - https://phabricator.wikimedia.org/T271345 (10Krinkle) Potential scap trap, warrants testing in Beta and group0: (NOTE) See (03CR) 10Yaron Koren: [C: 03+1] Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [05:04:00] 10phan, 10phan-taint-check-plugin: Audit WM maintained libraries for lack of phan - https://phabricator.wikimedia.org/T274475 (10Reedy) [06:33:36] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10Legoktm) [08:11:15] 10Phabricator: Make sure anti-vandalism features are up to snuff - https://phabricator.wikimedia.org/T84 (10Aklapper) [08:55:19] (03PS1) 10Hashar: dockerfiles: upgrade images to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663526 [08:58:21] (03PS2) 10Hashar: dockerfiles: upgrade images to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663526 [09:00:10] Project mwcore-phpunit-coverage-master build #1210: 04STILL FAILING in 6 hr 0 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/1210/ [09:01:28] (03PS3) 10Hashar: dockerfiles: upgrade images to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663526 [09:04:42] (03CR) 10Hashar: [C: 03+2] "Quibble 0.0.46 is long overdue, I have prioritized it over the switch to Buster cause it seems easier to conduct and has been waiting for " [integration/config] - 10https://gerrit.wikimedia.org/r/663526 (owner: 10Hashar) [09:06:12] (03Merged) 10jenkins-bot: dockerfiles: upgrade images to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663526 (owner: 10Hashar) [09:07:41] !log Building Quibble 0.0.46 Docker images on contint1001 (it is faster than contint2001) [09:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:16:09] 10Release-Engineering-Team (Pipeline), 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), 10Patch-For-Review: Add Link engineering: Deployment Pipeline setup - https://phabricator.wikimedia.org/T265893 (10kostajh) [09:20:03] Project beta-update-databases-eqiad build #48256: 04FAILURE in 2.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/48256/ [09:30:38] apergos: hi, usually you should not force merge a change ( https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/663520 ) :D [09:31:13] 10Beta-Cluster-Infrastructure: deployment-prep db upgrade fails: wikimedia/shellbox: 1.0.3 installed, 1.0.2 required - https://phabricator.wikimedia.org/T274492 (10hashar) [09:31:42] anyway that beta-update-databases-eqiad failure is transient. Will be solved once https://gerrit.wikimedia.org/r/c/mediawiki/core/+/663521 is merged [09:31:43] ah, sorry; it said "ready to submit" or some such, so I assumed it was not auto-submiting [09:31:57] yeah that is a glitch [09:32:04] that's my first experience with the vendor repo, noted for next time! [09:32:05] we should really remove the submit right :D [09:32:13] +2 ( :-P :-D ) [09:32:21] it is a bit messy yeah [09:32:28] i think the rule is to update vendor first [09:32:43] then have a mediawiki/core that does the composer.json update with a depends-on the vendor change [09:32:49] which is fine there [09:32:56] just broke beta cause we are in a bad state [09:32:57] right, we have that thanks to lego [09:33:01] ah grrrr [09:33:11] but that will solves as soon as the mediawiki/core change get merged and pushed to beta [09:33:16] anyway filed as https://phabricator.wikimedia.org/T274492 [09:33:20] for history purposes ;) [09:33:29] so in other words the second patch will force autosubmit and merge of the first one when it goes? [09:33:32] I guess one day we will revisit vendor.git entirely [09:33:40] so the idea is [09:33:42] prolly a good idea... one day :-D [09:33:51] all changes made to mediawiki/core vendor / extensions etc [09:34:00] when they get a +2 they are all chained in Zuul [09:34:10] subscribed [09:34:32] so by +2 ing the vendor change (which bumps shellbox) then +2ing the mediawiki/core one [09:34:52] the jobs for the mediawiki/core changes get realized with the vendor change incoporated in AS IF it already had been merged [09:35:12] and thus when testing the mediawiki/core change it will have vendor with shell box bumped to 1.0.3 and the jobs will pass fine [09:35:50] but yeah there is some state in which we have mediawiki/core requesting shellbox 1.0.2 while vendor already has 1.0.3 [09:36:00] it is not ideal [09:36:01] meh [09:36:13] one day maybe we will phase out vendor.git and dependencies snapshotting [09:36:25] but really, we haven't found any better way to snapshot dependencies in an auditable way [09:36:25] I wish we had a nice way to handle dependencies [09:36:35] said every developer on every complex project... [09:36:43] (and I think I introduced that vendor.git snapshot thing) [09:36:49] then [09:36:55] there are system that addresses those [09:37:01] and makes it possible to reproduce a build entirely [09:37:07] pretty sure Bazel offers that [09:37:16] huh [09:37:25] reproducibility would be pretty grand [09:37:49] but also some assurance that the packages are good and not having some bug or malware [09:37:52] Nix definitely ( https://en.wikipedia.org/wiki/Nix_package_manager ) [09:37:56] down to the system libraries [09:38:20] so you can snapshot your app with all of its dependencies including other software / system libs etc [09:38:29] aka mw 1.36 + reddis X + libc Y [09:38:55] and this will not go out and fetch from some other random place right? [09:38:57] and rebuild a package of your app with just a bump of redis X to X+1 to address a security issue but leave everything else snapshotted [09:38:59] but I digress [09:39:02] it assumes you have provided local of everything? [09:39:06] that does fetch from random place [09:39:08] yeah ok but it's a good digression [09:39:09] out of the interweb [09:39:15] mmmmm ugh no [09:39:18] sadness [09:39:19] but there are checksum to validate those [09:39:24] so you can repro the same state [09:39:25] mmmmmmmmmmmaybe [09:39:30] something like that ;] [09:39:35] if you have provided the checksums then [09:39:40] yeah well [09:39:42] if the checksums also come from random place, then nope [09:39:42] that is a requirement [09:39:44] else it bails out [09:39:46] (iirc) [09:40:01] ok, potentially interesting [09:40:02] so now [09:40:09] unlike say docker build [09:40:17] back to the matter at hand, next time I should +2 the vendor, +2 the core, (which I did) [09:40:23] and then what? [09:40:32] which blindly rebuild everything from scratch carrying any states change that happened since last build unless you find a way to freeze everything [09:40:35] but I am ranting at this point [09:40:44] so what you did was fine [09:40:45] yeah docker build plus apt-get update in those docker files usually (I have done this) [09:40:56] expect you should usually not force merge a change and let ci handle it for us ;) [09:41:06] so I should then wait for all the tests to pass on both [09:41:07] the rest (depends-on etc) is fine [09:41:10] then submit submit? [09:41:20] CI does the submit for us [09:41:26] ok [09:41:31] unlike on operations/puppet for which developers do the submit manually [09:41:37] so just +2 +2 and the rest is auto. [09:41:44] on the rest of the fleet, specially mediawiki repositories, CI/Zuul manages the submit for su [09:41:50] yes we do and I have the dumps ones (same) and the mwbzutils one (same) [09:41:55] which guarantee that jobs are always passing [09:41:59] yeah [09:42:03] ok great, I knew about that for core and extensions but only there [09:42:07] there is also some oddity [09:42:18] such that we do not run all jobs when a change is proposed [09:42:28] oh so some run only at +2? [09:42:31] for speed I guess [09:42:48] in case you wind up with several versions of the patch, that is reasonable [09:42:57] yeah [09:43:11] so for mediawiki/core , for sure we always run the php7.2 tests [09:43:19] cause that is what wikimedia runs [09:43:22] I would hope so :-D [09:43:33] and we dont only run php 7.4 or php 8.0 tests when someone +2 a change [09:43:52] should I ask how the php8 compatibility is coming along? [09:43:59] cause if on a new change the php7.2 fail, we dont care about the other php flavor results [09:44:04] for real [09:44:07] php 8 compat I have no idea [09:44:20] ok, I think it's not a rush anyways [09:44:21] historically the work to support a new php version goes all on good will of developers [09:44:32] that's a lot of extensions to check and convert [09:44:42] with the usual known folks carrying about it (james, maxem, kunal, etc) [09:44:48] :-) [09:45:08] and I guess at some point we had a php 8 job to all extensions and trigger it [09:45:14] or maybe introduce it as a non voting job [09:45:23] catch up with issues [09:45:27] and then it get promoted [09:45:33] and hope the majority don't need work [09:45:38] at which point we will enforce all repos to support 8.0 [09:45:42] mass conversions like that can be very tedious [09:45:51] yeah it is [09:46:07] going to upgrade all those jobs to use a new version of Quibble the mediawiki test runner [09:46:26] and next week switch the test images from Stretch to Buster which WILL break stuff here and there [09:46:29] is quibble a thing that can be run from the command line? [09:46:30] fun times [09:46:34] yeah [09:46:37] huh [09:46:42] I should give tours of quibble really [09:46:45] I wonder if I should look into it $someday [09:46:50] YES YOU SHOULD [09:47:14] the idea is that instead of oreading some doc as to how one can test a change made to Echo or whatever [09:47:22] how to install, how to run a specific tst, prep needed before running a test, how to run a batch of tests, etc [09:47:25] and figure out how to get the dependencies from composer + composer merge plugin [09:47:28] or how to spawn mysql [09:47:30] one can just: [09:48:14] quibble --db mysql --packages-source=composer mediawiki/Echo [09:48:23] and if one runs quibble on a laptop is thta going to clobber the local db install or create a bunch of new dbs or should one set up some ahead of time etc [09:48:35] and it will clone all the repos, spawn a db and populate it, install the deps and run every single test suite we have [09:48:39] which takes a while [09:48:47] na it spawns a one off db [09:48:50] if you already have some clones will it update them? [09:48:54] either in a temporary sqlite db [09:49:02] does it put the db on a different port? [09:49:05] oh sqlite [09:49:06] or by creating a mysql db from scratch and spawning a mysql daemon [09:49:09] yeah [09:49:11] ok [09:49:18] but [09:49:20] I would want to use the mysql version of that [09:49:27] I completely failed to market it actively [09:49:34] apparently you did yes [09:49:47] don't make me have to learn all about it and then write you a presentation [09:50:06] like i did with the dbas for 'dbs for sre folks' :-P [09:50:13] !log Successfully build Docker images for Quibble 0.0.46 [09:50:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:50:46] if you link me to 'everything there is to now about quibble' though on wikitech (I guess there is some such?) I will read it and poke around over time [09:50:53] there is some quick start doc at https://doc.wikimedia.org/quibble/ [09:51:15] should definitely get more step by step tutorials [09:51:21] or example usage for some common use case [09:51:22] ok these all imply a docker image [09:51:23] and [09:51:32] if one wanted to run it directly is there anything for that? [09:51:42] there are a bunch of glitches that make not so user friendly :/ [09:51:59] all software sucks, as does all hardware [09:52:06] it's fine [09:52:59] pip3 install --user git+https://gerrit.wikimedia.org/r/p/integration/quibble.git@"0.0.46"#egg=quibble [09:53:09] the core patch merged btw [09:53:17] mkdir workspace && cd workspace && ~/.local/bin/quibble [09:53:19] so I guess in 10 minuts it will be on beta and things wil be better [09:53:20] I hope [09:53:25] which will clone bunch of repos, install stuff and run tests [09:53:34] should take roughly 15 minutes iirc [09:53:53] ok, that's some useful info too [09:54:01] meanwhile [09:54:05] it is snowing like hell here [09:54:11] which happens once every 3 or 4 years! [09:54:24] and yeah beta is updating the git repositories [09:54:49] wow! snowball fight! [09:55:00] it's almost 18C here and sunny :-P [09:56:06] https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version shows shellbox 1.0.3 [09:56:08] lucky ariel! [09:57:10] 10Beta-Cluster-Infrastructure: deployment-prep db upgrade fails: wikimedia/shellbox: 1.0.3 installed, 1.0.2 required - https://phabricator.wikimedia.org/T274492 (10hashar) 05Open→03Resolved Self solved magically. The beta-update-database is now running so that was really just a transient failure. [09:57:14] https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/48257/console ! [09:57:16] solved ;] [09:57:38] Yippee, build fixed! [09:57:38] Project beta-update-databases-eqiad build #48257: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/48257/ [10:01:38] woo hoo! [10:01:51] ah btw who should come to these train log triage meetings? [10:53:48] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10Legoktm) [11:12:07] 10Diffusion, 10Gerrit, 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Wikimedia-GitHub: Create and maintain a list of organization repos that are maintained on Gerrit, GitHub, and Diffusion - https://phabricator.wikimedia.org/T237470 (10Aklapper) @thcipriani: That list is helpful, thank... [11:59:30] (03PS1) 10Hashar: jjb: update integration-quibble jobs to 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663556 [12:00:10] (03CR) 10Hashar: [C: 03+2] "deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/663556 (owner: 10Hashar) [12:01:45] (03Merged) 10jenkins-bot: jjb: update integration-quibble jobs to 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663556 (owner: 10Hashar) [12:02:32] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [integration/quibble] - 10https://gerrit.wikimedia.org/r/663557 [12:03:19] (03CR) 10Hashar: "That should test the releng/quibble* images which have 0.0.46. We install from source, but at least that would test the environment provi" [integration/quibble] - 10https://gerrit.wikimedia.org/r/663557 (owner: 10Hashar) [12:55:55] (03CR) 10Hashar: "check experimental" [integration/quibble] - 10https://gerrit.wikimedia.org/r/663557 (owner: 10Hashar) [13:37:49] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10Majavah) {T274526} seems to be a new issue in this train, not sure if a blocker. [14:22:31] (03CR) 10Lars Wirzenius: "I've not encountered makerelease2 before. I don't know what it does. How has this change been tested?" (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/657930 (https://phabricator.wikimedia.org/T272760) (owner: 10Reedy) [14:50:06] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [integration/quibble] - 10https://gerrit.wikimedia.org/r/663557 (owner: 10Hashar) [14:51:18] (03PS3) 10David Caro: ci-common: Add a bypass for the ci-src-setup script [integration/config] - 10https://gerrit.wikimedia.org/r/663202 (https://phabricator.wikimedia.org/T274347) [14:57:53] 10Release-Engineering-Team (Local Dev), 10dev-images, 10docker-pkg, 10serviceops, 10User-brennen: docker-pkg: "certificate verify failed: unable to get local issuer certificate" for docker-registry.discovery.wmnet when publishing dev-images from contint2001 - https://phabricator.wikimedia.org/T274306 (10J... [16:07:41] (03PS1) 10Hashar: jjb: Quibble 0.0.46 for fundraising jobs [integration/config] - 10https://gerrit.wikimedia.org/r/663601 [16:08:29] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/663601 (owner: 10Hashar) [16:09:58] (03Merged) 10jenkins-bot: jjb: Quibble 0.0.46 for fundraising jobs [integration/config] - 10https://gerrit.wikimedia.org/r/663601 (owner: 10Hashar) [16:11:05] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [16:11:55] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [16:13:26] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [16:14:43] 10Phabricator: Error thrown when trying to view previous git blame in phabricator - https://phabricator.wikimedia.org/T274559 (10TerraCodes) [16:16:29] 10Phabricator: Error thrown when trying to view previous git blame in phabricator - https://phabricator.wikimedia.org/T274559 (10TerraCodes) [16:24:37] 10Diffusion, 10Phabricator: Error trying to view previous git blame ("Skip past this commit" button): "reset() expects parameter 1 to be array, null given" - https://phabricator.wikimedia.org/T274559 (10Aklapper) [16:41:31] (03CR) 10Reedy: "> Patch Set 8:" (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/657930 (https://phabricator.wikimedia.org/T272760) (owner: 10Reedy) [16:42:48] (03PS1) 10Hashar: jjb: update phan jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663610 [16:46:58] (03PS1) 10Hashar: jjb: update coverage jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663611 [16:47:40] (03CR) 10Lars Wirzenius: "> Did you read the README? ;)" [tools/release] - 10https://gerrit.wikimedia.org/r/657930 (https://phabricator.wikimedia.org/T272760) (owner: 10Reedy) [16:48:36] (03PS1) 10Hashar: jjb: update fresnel jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663612 [16:56:24] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1376.eqiad.wmnet'] ` an... [17:02:24] hey all: @hashar @greg-g @James_F i have training today and i'd like to see the back of https://phabricator.wikimedia.org/T274210 before i go there. I'm not sure whether https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/663263 will work but do we want to try it (its beta cluster only) and see? [17:02:49] it's my understanding we can do beta cluster only changes out of backport windows [17:02:57] otherwise i dont think this is going to happen this week [17:03:15] and it's my understanding that it's important. [17:06:27] Jdlrobson: beta cluster-only changes can happen ad hoc, yeah [17:06:39] and yeah, this is Important(TM) ;) [17:07:01] i can't backport but if someone can i think im qualified to verify if they work :) [17:07:29] s/backport/ [17:10:34] (03PS2) 10Jforrester: Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension [integration/config] - 10https://gerrit.wikimedia.org/r/663352 (https://phabricator.wikimedia.org/T274069) (owner: 10Zoranzoki21) [17:10:43] (03PS3) 10Jforrester: Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension [integration/config] - 10https://gerrit.wikimedia.org/r/663352 (https://phabricator.wikimedia.org/T274069) (owner: 10Zoranzoki21) [17:10:56] (03CR) 10Jforrester: [C: 03+2] Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension [integration/config] - 10https://gerrit.wikimedia.org/r/663352 (https://phabricator.wikimedia.org/T274069) (owner: 10Zoranzoki21) [17:12:00] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1375.eqiad.wmnet'] ` an... [17:12:15] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension [integration/config] - 10https://gerrit.wikimedia.org/r/663352 (https://phabricator.wikimedia.org/T274069) (owner: 10Zoranzoki21) [17:12:33] Jdlrobson: yeah beta cluster only changes are fine to be merged any time. Though make sure to update production as well (pull , scap sync-file) to avoid the next deployer to be surprised [17:13:03] hashar: can you be that deployer (i can't do deploys)? [17:13:10] Jdlrobson: for logo changes we have a few people doing most of the logo updates, probably want to add them as reviewers [17:13:20] in meeting right now [17:13:30] else add it to backport window [17:13:35] hashar: okay. I'm going to be unavailable during the backport window [17:13:35] or poke folks in #wikimedia-operations i guess [17:13:44] but all tech dpt is in the same meeting so [17:13:53] so i'll need someone to do that, otherwise this will have to wait until monday :/ [17:14:01] guess you can add it with a note stating you lack deploy rights [17:14:05] and i guess it will just be done [17:14:11] which I don't think greg-g will like hehe [17:14:13] else add us as reviewer and we will just do it i guess [17:14:13] !log Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension T274069 [17:14:16] okay ill leave it in the backport window with instructions [17:14:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:14:18] T274069: Archive the GoogleAppEngine extension - https://phabricator.wikimedia.org/T274069 [17:14:26] but probably need a review from logo tech savvy folks [17:16:00] i think that's me lol? :) [17:16:09] the logo bits fine [17:16:20] the bit im not 100% sure about is how the -labs.php config file works [17:16:27] and whether that will clobber all production config values [17:16:35] s/clobber/replace [17:16:39] (03PS4) 10Jforrester: Disable running selenium tests for the Acrolinx extension [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [17:16:57] (03PS5) 10Jforrester: Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [17:17:14] Jdlrobson: the -labs.php file should never be loaded in production context [17:17:29] I don't know how it works currently [17:17:45] but the idea is thta if wgRealm == labs then it looks for a -labs.php file and if it exists load it [17:17:56] but on production wgRealm is not labs, so the -labs.php files are never included [17:18:00] so essentially a noop [17:19:02] 10Continuous-Integration-Config, 10cloud-services-team (Kanban): labs/toollabs fails debian-glue-unstable for lintian errors caused by the config - https://phabricator.wikimedia.org/T273896 (10aborrero) p:05Triage→03Medium [17:19:25] hashar: yep i know [17:19:40] but it also inherits from production .php which is why all the logos are wrong [17:19:47] anyway i added to deployment window with a note i wont be there [17:19:59] if someone can be there who understands this issue @greg-g that would make 100% sure this goes out [17:24:53] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Patch-For-Review: Add a bypass for the ci-src-setup script - https://phabricator.wikimedia.org/T274347 (10dcaro) p:05Triage→03Medium [17:30:18] (03CR) 10Jforrester: [C: 03+2] Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [17:31:45] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [17:35:27] (03CR) 10Ahmon Dancy: [C: 03+1] "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/654410 (owner: 10Hashar) [17:35:43] (03CR) 10Hashar: "Repasting my comment from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Acrolinx/+/657108 :" [integration/config] - 10https://gerrit.wikimedia.org/r/663353 (owner: 10Zoranzoki21) [17:36:16] !log Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests [17:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:37:17] (03PS1) 10Lars Wirzenius: make tests fail under Python3 to verify CI catches it [tools/scap] - 10https://gerrit.wikimedia.org/r/663617 [17:37:21] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [17:38:33] (03CR) 10jerkins-bot: [V: 04-1] make tests fail under Python3 to verify CI catches it [tools/scap] - 10https://gerrit.wikimedia.org/r/663617 (owner: 10Lars Wirzenius) [17:38:45] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1368.eqiad.wmnet'] ` Of... [17:39:02] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [17:39:43] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [17:57:11] 10Release-Engineering-Team (Local Dev), 10dev-images, 10docker-pkg, 10serviceops, and 2 others: docker-pkg: "certificate verify failed: unable to get local issuer certificate" for docker-registry.discovery.wmnet when publishing dev-images from contint2001 - https://phabricator.wikimedia.org/T274306 (10brenn... [18:27:55] (03PS2) 10Umherirrender: Fix spacing around exception type on @throws tag [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/661991 [18:28:01] (03PS5) 10Umherirrender: Check for superfluous @return statements and missing void types [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/658070 [18:44:52] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Toolforge, and 2 others: Add CI checks for golang admission controllers - https://phabricator.wikimedia.org/T236203 (10Bstorm) [18:47:30] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1368.eqiad.wmnet'] ` Of... [18:48:16] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1374.eqiad.wmnet'] ` an... [18:53:44] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [18:55:01] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10Legoktm) >>! In T157893#62733... [19:01:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10bd808) @hashar Do you have an... [19:04:03] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:04:37] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1363.eqiad.wmnet'] ` an... [19:05:20] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:06:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10thcipriani) >>! In T157893#62... [19:21:49] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:32:00] (03PS1) 10Bartosz Dziewoński: Update URL of VisualEditor demo [integration/docroot] - 10https://gerrit.wikimedia.org/r/663665 (https://phabricator.wikimedia.org/T274222) [19:38:43] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1368.eqiad.wmnet'] ` an... [19:43:21] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:48:53] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1362.eqiad.wmnet'] ` an... [19:56:42] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [20:02:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10hashar) > jenkins-bot is mana... [20:07:37] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1361.eqiad.wmnet'] ` an... [20:08:16] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10hashar) >>! In T271344#6822691, @Majavah wrote: > {T274526} seems to be a new issue in this train, not sure if a blocker. It seems... [20:12:29] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [20:12:37] James_F in reference to our discussion yesterday about beta cluster not being updated for code changes, https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/GlobalWatchlist/+/02ee0e39c0c59ba09a2f0f28b98f57192b85d576 merged half an hour ago and https://integration.wikimedia.org/ci/view/Beta/ shows the last beta-code-update-eqiad job [20:12:37] to have been 49 minutes ago, and checking manually on the beta cluster suggests the change hasn't been deployed - is there a way to trigger it manually? [20:13:39] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1365.eqiad.wmnet'] ` an... [20:15:58] hashar: aren't jenkins-bot and jenkins-deploy different LDAP users? [20:19:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10Legoktm) >>>! In T157893#6824... [20:20:11] legoktm: argh maybe I have mixed them up [20:20:29] confusingly the email for jenkins-deploy is jenkins-bot@ :) [20:20:31] DannyS712: I don't know where you're looking... [20:20:32] https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/332680/console [20:20:37] 20:14:25 Fetching submodule GlobalWatchlist [20:20:37] 20:14:25 From https://gerrit.wikimedia.org/r/mediawiki/extensions/GlobalWatchlist [20:20:37] 20:14:25 702a42a..02ee0e3 master -> origin/master [20:20:53] And then scap ran after [20:21:25] hmm, well it works now [20:22:40] JS is cached [20:23:28] I was checking load.php directly with debug mode enabled, doesn't that bypass caching? [20:25:20] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10mmodell) [20:25:25] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [20:25:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10hashar) AH YEAH sorry confuse... [20:25:30] legoktm: I have replied on the task to confirm. Thank you for the correction [20:26:14] hashar: I don't follow what you mean by "...we do not use that account outside of sshing to WMCS" [20:26:42] Toolforge is inside WMCS [20:27:46] what I meant is that jenkins-deploy is the account we use for Jenkins to ssh to WMCS instances [20:27:54] the jenkins agents [20:28:05] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Wikimedia-Site-requests, 10Readers-Web-Backlog (Kanbanana-FY-2020-21), and 2 others: [Regression] some beta cluster wikis using official logos - https://phabricator.wikimedia.org/T274210 (10Jdlrobson) Hey @greg it looks like the above patch... [20:28:08] yes, and we want jenkins to be able to ssh into toolforge [20:28:16] the reason for the name is that the first use case was to attach to dpeloyment-prep in order to have jenkins to update the code [20:28:26] got setup back in 2012 I believe with Ryan Lane [20:28:36] and we just kept that account and the ssh key ever since :] [20:28:42] to toolforge? [20:29:24] yes... see what you said in 2018 https://phabricator.wikimedia.org/T157893#4213783 and then what we discussed last year: https://phabricator.wikimedia.org/T157893#6269991 [20:29:35] the goal is to have post-merge auto deploy jobs to Toolforge [20:30:27] bah I thought the task was to update jenkins-bot email [20:30:28] ... [20:30:40] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.30 deployment blockers - https://phabricator.wikimedia.org/T271344 (10mmodell) [20:31:15] so yeah my reply back in 2017 (time flow) was to hack some postmerge job on the current Jenkins CI [20:31:24] or alternatively, to setup a dedicated Jenkins for toolforge usage [20:31:45] or whatever system that listen to stream events and react on that. Could be a custom made tool, Zuul or whatever else [20:31:55] then Jenkins as a plugin to listen to Gerrit already [20:32:15] I don't mind adding yet another use case to the CI Jenkins, though really it is not secure for deployment [20:32:29] I think it's secure enough for Toolforge tools [20:32:30] then we do that for deployment-prep alread so maybe yet another use case is not that much of big deal [20:32:36] it just add to the techdebt that system is [20:33:03] but maybe a Jenkins dedicated to toolforge might be a good choice so that other tool admin can administrate it as well [20:33:05] but this would be opt-in per tool, so maintainers can decide how secure they need it to be or not [20:33:06] and that split the concern [20:33:34] the only devil is that we dont have a full puppetization of jenkins so there are a lot of manual steps involved to configure it [20:33:52] I think it's pretty clear that we don't have resources to maintain *another* jenkins install [20:34:00] yeah [20:34:21] then CI is tight on resources itself and having to support deployment of a bunch of other tool also does add to our (my) workload [20:34:23] anyways, I'm explicitly volunteering to maintain these jobs and be on the hook for migrating them to whatever future CI system exists [20:34:25] with yet another use case to support [20:34:35] on an overall aging infra that we plan to phase out [20:35:00] then yeah it is probably lower overhead compared to setting up and maintaining an entirely new Jenkins instance [20:35:27] I think we've stalled enough on this task that it really doesn't make sense to keep blocking on things being not-ideal ("don't let perfect be the enemy of the good", etc.) [20:36:28] fwiw, I think a webhook triggering a local script might be simpler and less fragile/insecure than shell access from within CI [20:36:44] that's what some tools use today already, with CI e.g just making a curl request [20:37:15] throw in a basic rate limiter or IP filter if we want [20:38:08] jjb/labs.yaml: args: '--fail --silent --show-error --max-time 10 https://wikibugs.toolforge.org/pull.php' [20:38:08] could use a generic postmerge job that takes a URL as parameter [20:38:19] a webhook restarting the webservice that runs it is less fragile? [20:38:56] Hm.. a restart might be more tricky indeed. I was thinking for actual web services [20:39:01] why would it need a restart? [20:40:06] to deploy the new code? `git pull && webservice restart && dologmsg ....` basically [20:40:20] your example uses python [20:40:21] that explains [20:40:23] I see [20:40:45] all non-PHP webservices will need restarts [20:41:34] so, does it work to run webservice restart from within a pod? [20:43:24] I don't see how it could, given that `webservice restart` works by deleting that pod [20:43:45] also pretty sure the k8s credentials aren't available in a pod [20:43:52] sure they are :) [20:43:57] that's how k8s works [20:44:09] oh [20:44:10] If we want to generalise this with shell, maybe wmcs could allow some kind of very very limited sudo exemption where it can run one and only one specific shell command as any given user, e.g. $HOME/jenkins-notify.sh or something like that, in which tool authors can put their commands, that way we don't have to deal with PHP scripts or IP filters or anything lke that, but we also arent sending arbitrary shell commands [20:44:12] your creds are always mounted at a magic place [20:44:29] TIL [20:44:48] Krinkle: I'm not really sure what you're trying to protect against [20:45:05] CI running arbitrary shell commands on toolforge [20:46:15] we could have a /usr/bin/jenkins-deploy-python or smth I guess [20:46:28] yuck, but yeah [20:46:36] I don [20:46:57] or self-service friendly where the script is inside the tool $HOME [20:47:02] I'm not sure I honestly see the harm in "arbitrary" shell commands as the jenkins user [20:47:14] that user would be just like any other maintainer account [20:47:18] I don't see how much of this is necessary though, people are already allowing github, etc. to ssh in and run random stuff [20:47:32] but yeah, I'd rather not add jenkins-deploy to my tool user group if it means all of CI can now ssh in and run any command if anything goes wrong there or if the wrong thing gets merged. [20:47:37] so the harm it could do is the same harm any tool maintainer could do [20:47:59] Krinkle: sure. and it would be 100% opt-in [20:48:20] ^^ this isn't for every tool, but it certainly will help a decent amount IMO [20:48:39] I *know* there are many far less secure things already in a many toolforge tools :) [20:48:50] ack [20:49:11] well, I guess we could at least do it like doc-publish, where the job only runs on jenkins master, so the credentials are not exposed to pre-merge CI workers [20:49:54] and the command in zuul YAML, not read from git. [20:50:11] whcih I'm guessing is what was intended already, but just being verbose here [20:51:04] legoktm: I looked it up, the in-pod creds are mounted at /var/run/secrets/kubernetes.io/serviceaccount/ [20:51:13] https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod [20:51:51] that's cool. but I don't think `webservice restart` would work from in the pod that it's deleting...or at least I don't think we should encourage that [20:52:04] it should work I think ... [20:52:21] Krinkle: yeah, the commands would be defined in JJB [20:52:21] 10Continuous-Integration-Infrastructure, 10Quibble: Upgrade CI jobs to use Quibble 0.0.46 - https://phabricator.wikimedia.org/T274590 (10hashar) [20:52:22] your' right that the pod is deleted if it is the web server pod [20:52:27] (or on the Toolforge side) [20:52:28] on an unreleated security note, I wouldn't mind opting in to a way of running php tools that can't write to NFS :D [20:52:37] but it is recreated by the Deployment [20:53:02] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Wikimedia-Site-requests, 10Readers-Web-Backlog (Kanbanana-FY-2020-21), and 2 others: [Regression] some beta cluster wikis using official logos - https://phabricator.wikimedia.org/T274210 (10RhinosF1) There was https://phabricator.wikimedia.o... [20:53:31] (03PS2) 10Hashar: jjb: update phan jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663610 (https://phabricator.wikimedia.org/T274590) [20:53:33] (03PS2) 10Hashar: jjb: update coverage jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663611 (https://phabricator.wikimedia.org/T274590) [20:53:35] (03PS2) 10Hashar: jjb: update fresnel jobs to Quibble 0.0.46 [integration/config] - 10https://gerrit.wikimedia.org/r/663612 (https://phabricator.wikimedia.org/T274590) [20:53:42] Krinkle: It would be possible to make that happen (r/o mounts), but I'm not sure how to feature flag it into webservice [20:54:30] the next-gen system of buildpack derived images will probably support not mounting nfs at all [20:55:09] so going back to the problem at hand, we need hashar or thcipriani to log into the jenkins-deploy user at https://toolsadmin.wikimedia.org/ and signup for Toolforge [20:55:21] bd808: ack, yeah, havig webservice start service with a copy/snapshot would work as well. doesn't have to be live r-o nfs [20:55:47] legoktm: oh, so they have the creds? I thought that was the blocker before? [20:56:04] well, they have access to the email so if they don't have the creds they can reset it [20:56:18] *nod* [20:56:42] (and that's probably a good idea in general if the supposed account owners don't have the credentials...) [20:57:33] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1364.eqiad.wmnet'] ` an... [20:59:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Wikimedia-Site-requests, 10Readers-Web-Backlog (Kanbanana-FY-2020-21), and 2 others: [Regression] some beta cluster wikis using official logos - https://phabricator.wikimedia.org/T274210 (10Krinkle) >>! In T274210#6824338, @gerritbot wrote:... [20:59:35] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Upgrade CI jobs to use Quibble 0.0.46 - https://phabricator.wikimedia.org/T274590 (10hashar) [21:00:04] Project mwcore-phpunit-coverage-master build #1211: 04STILL FAILING in 6 hr 0 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/1211/ [21:00:24] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Upgrade CI jobs to use Quibble 0.0.46 - https://phabricator.wikimedia.org/T274590 (10hashar) [21:00:48] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Upgrade CI jobs to use Quibble 0.0.46 - https://phabricator.wikimedia.org/T274590 (10hashar) Announced on wikitech-l https://lists.wikimedia.org/pipermail/wikitech-l/2021-February/094270.html . Will do the switch Monday 02/15 during Euro... [21:02:01] bd808: legoktm: Krinkle: cant stay anymore sorry I am just too tired (been there for 13 hours or so) [21:02:10] but we can catch up on the task ;] [21:03:01] no worries, sounds good :) [21:04:35] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [21:08:09] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Wikimedia-Site-requests, 10Readers-Web-Backlog (Kanbanana-FY-2020-21), and 2 others: [Regression] some beta cluster wikis using official logos - https://phabricator.wikimedia.org/T274210 (10Jdlrobson) p:05High→03Medium It seems like at t... [21:10:06] I can add the user to toolforge, then the plan is that folks that want CI to deploy can add that user to their projects and it will shell in and run "deploy.sh" or something? [21:10:10] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1360.eqiad.wmnet'] ` an... [21:11:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Wikimedia-Site-requests, 10Readers-Web-Backlog (Kanbanana-FY-2020-21), and 2 others: [Regression] some beta cluster wikis using official logos - https://phabricator.wikimedia.org/T274210 (10Jdlrobson) [21:12:55] thcipriani: exactly [21:14:05] I guess jenkins would have to shell in and become the tool user to run the commands? [21:14:27] so generate an ssh key, add it to the toolforge account as well? [21:14:38] I'm logged into toolforge as jenkins-deploy [21:15:06] oh, I guess there are ssh keys already associated with the ldap record? [21:15:28] yeah, I think we should be able to reuse the existing ssh key? [21:15:29] if the jenkins key was added to the tool acocunt, then it would be possible to ForceCommand it, but i don't think that's supported [21:15:47] Platonides: yeah, it would run `become` [21:16:53] I was trying to think on a way to restrict it, but... [21:17:51] I'm not sure of sudo precedence rules [21:18:09] if it would be possible to restrict an access granted to run anything [21:19:24] I am struggling to think of a reason to have a seperate key for this. Lessen the blast radius of a key compromise, I guess. [21:20:11] what is the current key used for? [21:20:30] thcipriani: bd808 can confirm, but I think the ssh keys configured in toolsadmin will apply to all WMCS projects since they're stored in LDAP [21:21:20] logging into jenkins agents (I'm assuming) [21:21:58] and beta cluster for auto deploys and puppet-diffs for PCC [21:22:01] it looks like there are two keys in toolsadmin screen [21:22:04] so they are quite separate tasks [21:23:05] hmm [21:23:09] indeed [21:23:19] the key itself could have a forcecommand to only run deploy [21:23:30] we need a parameter in the middle, though [21:25:07] it would probably be simpler to change become, i think [21:25:24] so, the key could have command prefix [21:25:48] running just a become-deploy.sh script [21:26:00] which itself calls become ./deploy.sh [21:26:14] or ./jenkins-unicorn-magic.sh [21:29:04] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by legoktm on cumin1001.eq... [21:32:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10bd808) OK. Let's regroup here... [21:33:28] legoktm: well, I created a new ssh key and added it. It's stored in jenkins creds as well as jenkins-deploy-toolforge -- may not be useful to use a sepearte key, but probably doesn't hurt anything. [21:33:51] bd808: <3, too bad I can't give two tokens on Phab or I would have added antoher [21:35:00] thcipriani: thanks. I think you now need to "apply" to be a Toolforge member: https://toolsadmin.wikimedia.org/tools/membership/apply [21:35:09] and then I should see your request popup [21:35:31] done [21:35:57] * thcipriani updates task [21:36:31] * legoktm approves [21:36:53] https://ldap.toolforge.org/user/jenkins-deploy lists tools now [21:36:54] awesome [21:36:58] thcipriani: thank you :)) [21:37:26] sure, seems like it'll be useful. [21:38:12] as long as we do it in a way that's not too much overhead for toolforge folks/folks with integration/config +2 [21:38:26] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10DannyS712) [21:38:36] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10DannyS712) p:05Triage→03High [21:39:19] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Patch-For-Review: Add a bypass for the ci-src-setup script - https://phabricator.wikimedia.org/T274347 (10Legoktm) Most new jobs have a different pattern: * Run ci-src-setup-simple docker image to clone repos * Run tox/composer/etc. image to run... [21:39:33] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1359.eqiad.wmnet'] ` an... [21:40:09] that's the goal :) [21:40:52] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1355.eqiad.wmnet'] ` an... [21:41:16] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10thcipriani) >>! In T157893#68... [21:42:18] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10DannyS712) [21:44:04] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10Krinkle) Might be a recurrance of {T233134} [21:44:44] !log Logstash in beta is not receiving any events T274593 [21:44:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:44:47] T274593: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 [21:47:03] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10Krinkle) [21:49:25] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1354.eqiad.wmnet'] ` an... [21:49:41] 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), 10Release, 10Train Deployments: 1.36.0-wmf.31 deployment blockers - https://phabricator.wikimedia.org/T271345 (10Krinkle) [21:49:43] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10User-DannyS712: Logstash beta is not getting any events - https://phabricator.wikimedia.org/T274593 (10Krinkle) [22:25:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wiki-Loves-Monuments-Database, 10User-JeanFred: Automate deployment of heritage on Gerrit post-merge - https://phabricator.wikimedia.org/T157893 (10bd808) >>! In T157893#6824767... [22:56:46] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10SRE, 10serviceops, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1329.eqiad.wmnet', 'mw13... [23:14:09] 10Gerrit: Can't `git pull` mediawiki/core from Gerrit: "fatal: the remote end hung up unexpectedly" - https://phabricator.wikimedia.org/T263293 (10matmarex) Happened again today, trying to pull operations/mediawiki-config. Worked on the fourth try. {P14326}