[00:32:40] 10Continuous-Integration-Config, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Re-start Wikibase test coverage reporting - https://phabricator.wikimedia.org/T288396 (10tstarling) >>! In T288396#8987843, @hashar wrote: > MediaWiki core has `compo... [01:41:26] 10GitLab (Misc), 10Release-Engineering-Team (Seen), 10User-aborrero: Gerritlab: Stacked pull requests - https://phabricator.wikimedia.org/T300819 (10sbassett) There is such a concept of stacked MRs in Gitlab: https://docs.gitlab.com/ee/user/project/merge_requests/#update-merge-requests-when-target-branch-mer... [04:15:27] PROBLEM - Check systemd state on doc2002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-host-data-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:13:05] RECOVERY - Check systemd state on doc2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:58:55] (03PS1) 10DCausse: Add EventBus as a CirrusSearch dependency [integration/config] - 10https://gerrit.wikimedia.org/r/935670 [08:00:14] (03CR) 10CI reject: [V: 04-1] Add EventBus as a CirrusSearch dependency [integration/config] - 10https://gerrit.wikimedia.org/r/935670 (owner: 10DCausse) [08:02:00] (03PS2) 10DCausse: Add EventBus as a CirrusSearch dependency [integration/config] - 10https://gerrit.wikimedia.org/r/935670 [08:08:20] 10Beta-Cluster-Infrastructure, 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.1 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10MoritzMuehlenhoff) Sure, I'll update the package later the day. [08:11:04] 10Beta-Cluster-Infrastructure, 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.1 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10Joe) We also need to rebuild the base php-fpm images for mediawiki on k8s [08:12:32] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10jnuche) [08:14:44] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10jnuche) @Peachey88 I've improved the explanation in the description [08:50:13] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10hashar) It is pretty much already public. The configuration as code configuration at https://gitlab.wikimedia.org/repos/... [08:59:48] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) [08:59:57] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) [09:00:01] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T340244 (10hashar) [09:00:13] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) p:05Triage→03Unbreak! [09:06:38] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) The helm output mentions a timeout: ` COMBINED OUTPUT: WARNING: Kubernetes configuration file is group-readable. This is insecure. Locat... [09:12:55] 10Beta-Cluster-Infrastructure, 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.1 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10MoritzMuehlenhoff) >>! In T340542#8987623, @TheresNoTime wrote: > @MoritzMuehlenhoff (se... [09:14:43] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) ` lang=irc 09:07:19 looks like the namespace is getting out of quota, which is blocking the updated pods fro... [09:16:54] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T340244 (10Clement_Goubert) [09:17:52] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10Clement_Goubert) 05Open→03In progress a:03Clement_Goubert [09:27:13] 10Release-Engineering-Team, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: MediaWiki deployment to kubernetes fails on group1 promotion - https://phabricator.wikimedia.org/T341114 (10hashar) 05In progress→03Resolved [09:27:19] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T340244 (10hashar) [09:42:48] (03CR) 10Hashar: [C: 03+2] Add EventBus as a CirrusSearch dependency [integration/config] - 10https://gerrit.wikimedia.org/r/935670 (owner: 10DCausse) [09:43:10] hashar: thanks! :) [09:43:17] :-] [09:43:24] I promise one day I will make that process easier [09:43:51] (03Merged) 10jenkins-bot: Add EventBus as a CirrusSearch dependency [integration/config] - 10https://gerrit.wikimedia.org/r/935670 (owner: 10DCausse) [09:43:52] tbh it's not that terrible once you know what to change :) [09:44:15] still, I hate how the dependencies are managed centrally :D [09:44:34] but short of dynamically maintaining a dependency graph of all extensions, I don't see any other solution [09:44:45] and I am afraid that would requires cloning all repo on the CI builds [09:45:26] or maybe create a service which receives updates/patches and provide the graph to the CI build [09:45:31] it is tricky [09:45:32] yes, if you let extension devs control their CI deps you'd have to keep a close eye on all this... [09:45:57] there is also the issue that CirrusSearch now depends on EventBus [09:46:21] but if one send a change to EventBus which breaks the CirrusSearch code that is not caught [09:46:37] anyway, I have deployed your change and heaidng to kid school :] [09:46:54] thanks! [10:30:30] hi folks. can anyone with prod cluster access deploy docker images to the wmf registry as outlined here https://gitlab.wikimedia.org/repos/releng/dev-images ? [10:32:18] I'm struggling to deploy this image https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/c2ab192b1d1dc05446cc6c2976d6a097966a7bcb [10:33:09] 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog 📥), 10API Platform, 10Anti-Harassment, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10kostajh) In Gerrit / PipelineLib workflow, the PipelineBot makes a comment in Gerrit with the n... [10:34:24] oh I wonder if it's due to me being in the 'analytics_privatedata_users' group [10:35:07] ah that might be it [10:35:28] could someone else with access deploy the image to the wmf docker registry when convenient pls? [10:41:28] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10jnuche) https://ldap.toolforge.org does not offer a full listing of LDAP users/groups in its landing page, so it can't b... [10:44:30] jgleeson: i am pretty sure the deploy script is still broken [10:44:36] is this commit merged? [10:44:57] ah yeah [10:45:22] and the image should have been built immediately after it got merged [10:47:47] 10Release-Engineering-Team (Priority Backlog 📥), 10MediaWiki-Docker, 10dev-images, 10docker-pkg, 10User-brennen: Permissions / ownership interfere with publishing dev-images - https://phabricator.wikimedia.org/T277604 (10hashar) Eventually that is addressed by https://gerrit.wikimedia.org/r/c/operations/... [10:48:06] jgleeson: so that is https://phabricator.wikimedia.org/T277604 which got filed after giving +2 right to FR tech [10:48:14] and the fix is https://gerrit.wikimedia.org/r/c/operations/puppet/+/927975 [10:54:52] oh rly [10:54:57] ah thanks hashar [10:55:17] and once that's merged going forward they will be automatically deployed? [10:59:25] ah no [11:00:04] :O [11:00:17] the patch introduces a unix user `dockerpkg-builder` which clones/own the repository on the host. The deploy script will then be adjusted to use `sudo -u dockerpkg-builder` [11:00:28] ah I see [11:00:41] so from git point of view, it is owned by a single user. That is how we do it for the CI /releng/ Docker images [11:01:08] that saves a bit of madness cause git does not necessarily play well when the repo is shared between users. There are edge cases :] [11:01:37] I am going to ask sre to get that Puppet patch merged, it is overdue and has limited effect so that should be easy [11:03:29] hashar: I'm wondering how we've had our docker image updates deployed between when that ticket was first created and now [11:03:32] it looks like an old ticket [11:04:12] there is https://gerrit.wikimedia.org/r/c/operations/puppet/+/927975 to grant merge access to the repo [11:04:21] and I guess we will want to check how to give your rights to rebuild the images [11:04:33] that is done currently on contint servers so one has to be in the contint-admins group [11:04:50] maybe we should move those docker-pkg images to the deployment host [11:11:38] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Direct 0.5% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341078 (10Clement_Goubert) Everything looks good. mw-api-ext: {F37129502} {F37129504} {F37129506} mw-web: {F37129508} {F37129510} {F37129512} [11:52:37] 10Release-Engineering-Team (Priority Backlog 📥), 10MediaWiki-Docker, 10dev-images, 10docker-pkg, 10User-brennen: Permissions / ownership interfere with publishing dev-images - https://phabricator.wikimedia.org/T277604 (10hashar) a:05brennen→03hashar [11:55:21] jgleeson: that got merged :) [11:55:42] nice! [11:55:45] !log Building dev-images [11:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:56:14] it is going to take a while since I think there are some other unbuilt images [11:59:25] !log Successfully published image docker-registry.discovery.wmnet/dev/fundraising-civiproxy-buster-php73-apache2:0.0.1-1-s4 [11:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:59:28] jgleeson: it worked! [12:11:16] !log Building dev-images for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/36 | "Install Xdebug from Debian package, update source of php72 and php74" | T338208 [12:11:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:11:20] T338208: xdebug causes segmentation fault in PHP when hitting a breakpoint in a phpunit test - https://phabricator.wikimedia.org/T338208 [12:17:50] ty hashar !!! [13:04:56] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) GitLab replicas (e.g. https://gitlab-replica.wikimedia.org) use oidc as the default lo... [13:15:17] anyone around who knows how to do a private config change in beta? (i.e. add a new secret) [13:15:48] the change was requested in the backport+config window (see #wikimedia-operations); I think I have an idea how to do it, but wouldn’t mind someone who actually knows to review it ;) [13:21:23] hi folks. just as an FYI, getting a 404 on https://integration.wikimedia.org/ci/job/debian-glue/2301/console [13:21:37] and the job is pending for a while [13:23:41] (job finished) [13:32:41] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T340244 (10jnuche) [13:34:36] !log Building dev-images for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/40 [13:34:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:52:40] 10Release-Engineering-Team (Radar), 10FR-Docker, 10Fundraising-Backlog, 10Gerrit-Privilege-Requests, and 3 others: dev-images +2 rights and Docker registry credentials for FR-Tech - https://phabricator.wikimedia.org/T274303 (10hashar) [13:52:50] 10Release-Engineering-Team (Priority Backlog 📥), 10MediaWiki-Docker, 10dev-images, 10docker-pkg, 10User-brennen: Permissions / ownership interfere with publishing dev-images - https://phabricator.wikimedia.org/T277604 (10hashar) 05Open→03Resolved The `fab` build script for dev-images is fixed by http... [13:57:48] I guess I should fix my local docker-pkg setup [14:00:30] !log added $wgCampaignEventsProgramsAndEventsDashboardAPISecret to beta PrivateSettings (T320258) [14:00:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:00:33] T320258: Dashboard integration: Configure the P&E Dashboard integration in beta - https://phabricator.wikimedia.org/T320258 [14:20:21] so I have messed up with the dev-images :/ [14:20:48] but at least I fixed docker-pkg locally so I am now building them all locally [14:21:25] 10GitLab (Infrastructure), 10Release-Engineering-Team, 10serviceops-collab, 10Patch-For-Review: Upgrade GitLab to major version 16 - https://phabricator.wikimedia.org/T338460 (10Jelto) [14:59:30] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10joanna_borun) [15:11:50] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10CAS-SSO, 10Infrastructure-Foundations, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Arnoldokoth) I ran into a 422 error trying to login to gitlab-replica. {F37129701} [15:30:51] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10Dzahn) What's the difference to this though? https://wikitech.wikimedia.org/wiki/Special:ListUsers [15:33:22] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab, 10Patch-For-Review: contint2002 service implementation tracking - https://phabricator.wikimedia.org/T324659 (10hashar) At first sorry for the large delay, the last 2/3 weeks have been pretty much jumping from an interrupt to another one.... [15:34:40] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10hashar) [15:34:58] 10phabricator maintenance bot: Make it possible to add reviewers automatically to patches uploaded by Maintenance bot - https://phabricator.wikimedia.org/T340796 (10Dzahn) > Could you link to an example please? https://gerrit.wikimedia.org/r/c/operations/dns/+/874845 [15:39:25] hashar: I think some of the newer integration runners are missing labels which means no jobs are running on them [15:39:25] 10Gerrit, 10serviceops-collab: Gerrit LFS objects lack an automatic sync to gerrit replicas - https://phabricator.wikimedia.org/T257741 (10Dzahn) Isn't this a conflict with the statement that LFS data is not needed on replica hosts? [15:40:55] !log Manually added Docker label to integration-agent-docker-1054 to test; instantly got jobs assigned. [15:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:41:06] taavi: I'll try to tag them all before my meeting. [15:45:04] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10hashar) The #puppet-compiler got created to track just that software which is at https://gerrit.wikimedia.org/r/operations/software/puppet-compiler . With time for sure a lot more got added be it rspec/puppet... [15:45:10] !log Manually added Docker label to new integration agents [15:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:55:40] oops [15:56:04] taavi: James_F: not sure how I missed that :-\ [15:56:39] they all look fine now thank you! [15:56:52] hashar: We can probably drop the pipelinelib tags nowadays, I think? [15:57:10] Everything targets docker now? [15:57:29] I don't know which label/labels integration/pipelinelib.git is relying on [15:57:39] * James_F nods. [15:59:06] looks like `blubber` for build/run/copy. `dockerPublish` for publishing an image and `chartPromote` for helm [15:59:31] + src/org/wikimedia/integration/Pipeline.groovy: private static String baseNodeLabel = "pipelinelib" [15:59:33] so yeah a lot :] [16:00:01] the `Docker` one we can probably rename it to `Zuul` or just leave it alone :] [16:07:41] (03PS1) 10Hashar: jjb: keep codehealth builds for 7 days (was 15) [integration/config] - 10https://gerrit.wikimedia.org/r/935774 [16:14:35] 10Continuous-Integration-Infrastructure, 10Jenkins, 10serviceops-collab, 10Patch-For-Review: Do not expose LDAP users on release Jenkins UI - https://phabricator.wikimedia.org/T341074 (10jnuche) 05Open→03Declined @Dzahn that's a good point, I hadn't realized we were also exposing LDAP usernames through... [16:29:54] 10Continuous-Integration-Infrastructure, 10serviceops-collab: allow mwmaint/cumin hosts to connect to http on contint - https://phabricator.wikimedia.org/T340788 (10LSobanski) p:05Triage→03Medium [16:37:36] 10Continuous-Integration-Infrastructure, 10serviceops-collab: allow mwmaint/cumin hosts to connect to http on contint - https://phabricator.wikimedia.org/T340788 (10hashar) I'd prefer to stick to only allowing http traffic from ATS/Varnish caches to avoid unwanted dependencies. Maybe httpbb should pass through... [16:41:05] 10Phabricator, 10serviceops-collab: consider moving aphlict admin port to https / envoy - https://phabricator.wikimedia.org/T340169 (10LSobanski) [16:41:16] 10Continuous-Integration-Infrastructure, 10serviceops-collab: allow mwmaint/cumin hosts to connect to http on contint - https://phabricator.wikimedia.org/T340788 (10hashar) doc.wikimedia.org allows the deployments hosts: ` ferm::service { 'doc-http': proto => 'tcp', port => '80',... [16:41:18] 10Phabricator, 10serviceops-collab: consider moving aphlict admin port to https / envoy - https://phabricator.wikimedia.org/T340169 (10LSobanski) p:05Triage→03Medium [17:02:28] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) a:05Dzahn→03Jelto [17:14:38] 10Project-Admins: Please create a project for the Page Exchange extension - https://phabricator.wikimedia.org/T341149 (10Yaron_Koren) [18:06:42] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10Aklapper) > @Aklapper certainly has some good recommendations :] Keep it as simple and stupid as possible, as long as it doesn't hurt us? :P [18:16:05] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10hashar) Renamed https://phabricator.wikimedia.org/project/manage/1807/ (which screws up the task description since it had `#puppet-compiler` which now renders as #puppet_ci. I think you might be able to upda... [18:16:40] 10Project-Admins: Change puppet-compiler tag to puppet CI - https://phabricator.wikimedia.org/T341136 (10hashar) [19:54:33] 10Project-Admins: Please create a project for the Page Exchange extension - https://phabricator.wikimedia.org/T341149 (10Urbanecm) Requested public project #mediawiki-extensions-pageexchange has been created: https://phabricator.wikimedia.org/project/view/6640/ (In case you need to edit the project or project w... [19:54:40] 10Project-Admins, 10User-Urbanecm: Please create a project for the Page Exchange extension - https://phabricator.wikimedia.org/T341149 (10Urbanecm) 05Open→03Resolved a:03Urbanecm [20:35:47] "This job could not start because it could not retrieve the needed artifacts: kokkuri:setup-variables" [20:35:55] I copied from an existing thing though [20:36:22] do I really need both blubber AND kokkuri to work now for the same repo? [20:36:47] because last thing before this was to fix a sytax issue in the blubber yaml.. [20:45:57] are there docs for kokkuri? [20:46:26] works: https://gitlab.wikimedia.org/repos/sre/miscweb/annualreport/-/blob/master/.gitlab-ci.yml doesn't work: https://gitlab.wikimedia.org/repos/sre/miscweb/statictendril/-/blob/main/.gitlab-ci.yml [20:51:47] ^ dduvall [21:28:16] mutante: do you have a link to the ci pipeline that failed? [21:29:15] ah. this one maybe https://gitlab.wikimedia.org/repos/sre/miscweb/statictendril/-/pipelines/21772 [21:33:06] dduvall: the weird part is .. meanwhile the error changed [21:33:19] that one I mentioned above is simply gone.. by retrying [21:33:33] instead I now have" Invalid syntax line in ./.pipeline/blubber.yaml: build frontend docker-registry.wikimedia.org/repos/releng/blubber:v0.17.0 is not allowed" [21:33:45] what's interesting about this ... [21:33:51] it says the job is succesful [21:33:59] despite also saying invalid syntax [21:34:23] the "could not retrieve the needed artifacts: kokkuri:setup-variables"" simply disappeared though [21:34:54] i've seen that "invalid syntax" error. it's something wrong in kokkuri currently but, right, is not currently failing the job. you can ignore it [21:35:30] ok, but that doesnt mean I see my image on docker-registry [21:36:14] dduvall: am I supposed to have blubber.yaml and .gitlab-ci.yml .. both ? [21:36:22] isn't one replacing the other [21:36:56] job link: https://gitlab.wikimedia.org/repos/sre/miscweb/statictendril/-/jobs/115766 [21:37:01] the "could not retrieve the needed artifacts" error is strange. i'm not sure what's going on there but it has something to do with the job that kokkuri appends to the `.pre` stage in order to initialize some new variables [21:37:51] mutante: yes, blubber is necessary. it's the only way we can build images for production [21:38:49] ok, so what is kokkuri ?:) [21:39:25] and can you tell from that job link if it tried to publish to registry? [21:40:12] mutante: in order to publish to docker-registry.wikimedia.org you need to run your job on the trusted runners. your project needs access to the trusted runners via the ACL in https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner. here's an example of how blubber publishes its image https://gitlab.wikimedia.org/repos/releng/blubber/-/blob/main/.gitlab-ci.yml#L53 [21:41:09] kokkuri is a utility that wraps `buildctl` (the buildkit client) and a set of GitLab CI includes to helps to standardize the client side of building images [21:41:53] dduvall: I thought I added that to gitlab-trusted-runner [21:42:06] you might have, but you still need a "trusted" tag on the job [21:42:16] that's how the trusted runners pick up jobs [21:42:34] ok, well, thank you. I am completely lost though. [21:42:56] we're working on docs :) [21:43:17] ack, thanks. that sounds good, I could not find the string [21:45:02] what is especially confusing to me is that I copy/pasted everything from an existing project that does the same thing.. yet I am missing stuff [21:45:13] there are a lot of moving parts. for registry publishing, kokkuri depends on a few different environment variables to know where it can push to. on wikimedia.cloud and DO runners, there are local registries used for caching image layers and for testing images. on wmnet (trusted) runners, the environment variables point kokkuri to docker-registry.wikimedia.org [21:45:22] apparently the other projects dont need these tags [21:45:27] but also built an image [21:46:17] you can build and publish an image in any of the environments, but the trusted runners are the only ones that configure kokkuri to push to docker-registry.wikimedia.org [21:46:48] what other place could you publish to if not our docker-registry? [21:47:59] dduvall: I do have "tags: trusted" in my .gitlab-ci.yml [21:48:38] we use the same mechanism to publish stuff to the toolforge registry [21:48:39] it's just like https://gitlab.wikimedia.org/repos/sre/miscweb/annualreport/-/blob/master/.gitlab-ci.yml which did publish to docker-registry [21:49:02] in the DO, we have reggie which is a local registry optimized for caching image layers. the DO runners configure kokkuri to push there. wikimedia.cloud runners i believe have their own local registry as well. i'm not certain [21:50:30] ok, but I am not using toolforge, not using DO and everything I do is copied, including the tags, from an existing thing that publishes to docker-registry.wikimedia.org [21:50:57] and the job says it was succesful [21:52:31] should I see a line in the job output that tells me it pushed to registry? [21:53:36] mutante: sorry, i missed that you have the right tags. it looks like your `publish-image` job was not added to the pipeline. that's the problem [21:54:36] what does "added to the pipeline" mean in this context? [21:54:54] something that isn't in the yaml files? [21:55:00] the `rules` section of that job says `- if: $CI_COMMIT_TAG && $CI_COMMIT_REF_PROTECTED` which means the job is only scheduled if the pipeline ran against a git tag that matches the project's protected tag settings [21:55:32] mutante: https://docs.gitlab.com/ee/ci/jobs/job_control.html [21:56:38] "checks the pipeline configuration" sounds like it would read my yaml file [21:56:57] keeps comparing to find any differences in the repos [21:58:30] mutante: TL;DR push a new tag to your repo and see if `publish-image` runs [21:59:03] because that's how you have it configured in your `.gitlab-ci.yml`, to only run that job when a new protected tag is pushed [21:59:11] dduvall: is pipeline configuration here NOT .gitlab-ci.yml ? [21:59:39] there's nothing wrong with your config :) [21:59:45] yes, `.gitlab-ci-yml` [21:59:49] ok, in that case, thanks for trying :) [22:00:29] what do you want the behavior to be exactly? i'm confused at what you're trying to accoomplish [22:01:19] I need a docker image in the registry [22:01:31] do you want to build and publish an image every time you merge an MR, or do you want to build and publish an image only when you push a new git tag? [22:01:37] before this meant: write blubber file, upload, done [22:01:59] every time [22:03:10] seems like I have to chat with Jelto then [22:03:22] because we probably want consistency [22:03:28] mutante: ok, so now i can help. since you want it to build and publish for every MR merge, change the entry under `rules` of `publish-image` to `if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH` [22:03:30] between very similar things [22:04:10] ok, trying that, thanks [22:04:13] np [22:08:13] re: "before this meant: write blubber file, upload, done" and write a `.pipeline/config.yaml` and get someone to help you with the integration/config changes. now folks have control over their own CI config, but yeah, that control means learning a whole new set of CI syntax and behaviors [22:08:47] GitLab CI syntax/behaviors that is [22:10:02] the "build-image" job is shown as succesful. But I dont see it on the registry. [22:11:41] not sure if I should see it attempting to push in the logs [22:12:08] the new thing: I now see 3 stages, not just 2 [22:12:19] the last stage is publish and is passed [22:15:21] "pushing manifest for docker-registry.discovery.wmnet/repos/sre/miscweb/statictendril:main" [22:15:26] guess at this point it must be caching [22:15:53] mutante: the `build-image` job doesn't push. the `publish-image` job pushed the image and it looks like it was successful! [22:16:24] yes, I see 3 stages, build and publish are shown as succesful [22:16:33] image does not appear on the registry site [22:16:40] `docker manifest inspect docker-registry.wikimedia.org/repos/sre/miscweb/statictendril:main` works for me [22:16:57] the registry site doesn't update right away [22:17:40] ACK, it's 404 https://docker-registry.wikimedia.org/repos/sre/miscweb/statictendril/tags [22:17:54] yep, but your image is in the registry [22:18:24] ok, thank you. I will wait for tomorrow [22:18:42] the website is a populated by https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/modules/docker_registry_ha/files/registry-homepage-builder.py btw [22:19:10] not sure the frequency but i'm guessing there's a timer in that same puppet module somewhere [22:20:27] ACK! i see it, looks like hourly [22:23:23] starting the timer myself [22:25:29] thanks dduvall I should build a better mental model of all this stuff :) [22:29:55] well, I ran the build-homepage timer and it took a while, then finished.. but I still don't see it. but ok, let me just check it again later [22:31:08] maybe I need to do the same on all 4 hosts [22:33:47] I wonder if there are better dynamic registry homepage things that people like? [22:34:19] way back when it was running hourly on my laptop and pushing to my people page :D [22:35:37] then lego.ktm came along and made it Real™ [22:50:07] “Most used docker images this week”. “Newest images”. “Ten python images. You won’t believe what landed in number seven!”? [23:16:52] 10Project-Admins, 10User-Urbanecm: Please create a project for the Page Exchange extension - https://phabricator.wikimedia.org/T341149 (10Yaron_Koren) @Urbanecm - thank you! [23:36:15] thcipriani: :) at the time when I worked on that, the main issue was that the registry tag API was super slow so anything dynamic was doomed [23:36:19] dunno if that's been fixed since or not [23:37:51] oh that's right, I'd forgotten the fetching tags was super slow [23:39:46] haven't actually had to use that curl call since that project existed, so no idea if that ever got fixed :)