[03:26:37] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<11.11%) [03:29:55] 10Release-Engineering-Team, 10Language-Team (Language-2019-July-September): Add Santhosh and Petar to wmf-deployment group - https://phabricator.wikimedia.org/T229777 (10KartikMistry) [06:56:36] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:07:10] 10Release-Engineering-Team, 10Language-Team (Language-2019-July-September): Add Santhosh and Petar to wmf-deployment group - https://phabricator.wikimedia.org/T229777 (10KartikMistry) I add Santhosh and Petar to https://gerrit.wikimedia.org/r/#/admin/groups/21,members - so that they have +2. Logging it here. [08:42:35] (03PS5) 1020after4: local-charts: CLI for managing minikube, helm, etc [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) [09:29:08] 10Continuous-Integration-Infrastructure, 10Jenkins: Include svgmin into the CI processes - https://phabricator.wikimedia.org/T229763 (10Reedy) [10:57:19] 10Release-Engineering-Team, 10Gerrit-Privilege-Requests, 10Language-Team (Language-2019-July-September): Add Santhosh and Petar to wmf-deployment group - https://phabricator.wikimedia.org/T229777 (10MarcoAurelio) [11:12:17] 10Project-Admins: WikiJournal initial tasks - https://phabricator.wikimedia.org/T229745 (10jeblad) 05Stalled→03Invalid I don't have time for this. [12:07:06] 10Release-Engineering-Team (Code Health), 10Release-Engineering-Team-TODO, 10Code-Stewardship-Reviews: Code Stewardship Review: OAuth extension - https://phabricator.wikimedia.org/T224919 (10Tgr) CPT had plans to work on OAuth 2 ({T229500}) so maybe they are interested in taking ownership. OTOH if you look... [12:53:15] 10Release-Engineering-Team (Pipeline), 10Operations, 10Release Pipeline, 10serviceops, 10Goal: Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10akosiaris) [13:12:47] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Parsoid-PHP, 10Patch-For-Review: Installing composer modules for deployment - https://phabricator.wikimedia.org/T213494 (10Tgr) You could symlink Parsoid's `composer.json`, run composer on the top level and make sure the a... [13:20:52] PROBLEM - Host deployment-mx02 is DOWN: CRITICAL - Host Unreachable (172.16.4.120) [13:25:54] RECOVERY - Host deployment-mx02 is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms [13:26:31] (03CR) 10Giuseppe Lavagetto: [C: 03+1] php7x: restart php-fpm after all sync operations [tools/scap] - 10https://gerrit.wikimedia.org/r/525119 (https://phabricator.wikimedia.org/T224857) (owner: 10Thcipriani) [13:39:47] 10Project-Admins: WikiJournal initial tasks - https://phabricator.wikimedia.org/T229745 (10Aklapper) Then it's unclear to me why this task was filed. For general Phab support question, see https://www.mediawiki.org/wiki/Talk:Phabricator/Help - thanks! [13:49:47] PROBLEM - Free space - all mounts on deployment-mediawiki-07 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki-07.diskspace.root.byte_percentfree (<10.00%) [13:52:19] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Traffic: Rename gerrit-slave to gerrit-replica - https://phabricator.wikimedia.org/T229822 (10Paladox) [13:56:04] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Parsoid-PHP, 10Patch-For-Review: Installing composer modules for deployment - https://phabricator.wikimedia.org/T213494 (10ssastry) Ok, let us pick a simple solution for now that works for scandium's deployment so we can r... [13:56:38] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Parsoid-PHP, 10Patch-For-Review: Installing composer modules for deployment - https://phabricator.wikimedia.org/T213494 (10ssastry) [14:00:27] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Parsoid-PHP, 10Patch-For-Review: Installing composer modules for deployment - https://phabricator.wikimedia.org/T213494 (10ssastry) [14:17:47] greg-g legoktm we can switch gerrit2001 to gerrit-replica now :) (vgutierrez has done the acme stuff & merged my dns change!) [14:26:27] thcipriani i think we should raise the heap on gerrit2001 (it has 64gb) [14:26:37] based on this https://gerrit-slave.wikimedia.org/r/monitoring?part=graph&graph=usedMemory [15:09:15] paladox: I thing that the memory graph looks ok (heap might be a bit oversized on this instance, even) GC seems to be reclaiming memory. Plus it's only serving like 10% of the traffic of the main instance that's heap is the same (and can't be bigger right now, unfortunately :\) [15:09:42] ok [15:10:13] I am a little worried about https://phabricator.wikimedia.org/T229756, dunno what could have caused that :\ [15:10:30] 10Release-Engineering-Team (Code Health), 10Release-Engineering-Team-TODO: Add Code Stewardship review to Review Queue process - https://phabricator.wikimedia.org/T203698 (10Aklapper) I've made some edits over the last days to https://www.mediawiki.org/wiki/Review_queue as I had to realize in https://phabricat... [15:11:18] I restarted gerrit during that branch cut, but docs seem to suggest replication has some persistence in the data directory. [15:11:51] thcipriani that's on the slave [15:12:12] which was fixed [15:12:21] by him running the replication for that repo [15:13:15] thcipriani i suggest we do a full replication for all repos (since it's likely the slave fell out of sync) [15:13:36] but shouldn't the tag being created have triggered a replication event? [15:14:07] yeah, it seems like a good idea, but why would it have fallen out of sync if we have the replication plugin configured properly? [15:15:07] good question [15:15:08] two possibilities: (1) we don't have it configured properly (2) there's a bug with replication (3) there's a log somewhere screaming about it that we should have monitoring for :) [15:15:18] er, three possibilities I guess :P [15:16:00] (counting was never my strong suit) [15:17:12] i think (2) is the realistic option. I've seen fixes for replication [15:19:20] https://gerrit-review.googlesource.com/c/plugins/replication/+/231210 (not sure if that's the fix) [15:20:28] thcipriani i think we want to set replicateOnStartup [15:20:37] so that repos doin't become out of sync when we restart [15:20:45] see https://github.com/GerritCodeReview/plugins_replication/blob/master/src/main/resources/Documentation/config.md#file-replicationconfig [15:24:33] we could try replicateOnStartup, I wonder what the overhead of that is? [15:24:42] that patch is interesting [15:25:00] it'll mean every time you restart gerrit it'll push all remotes [15:25:46] yep [15:26:45] I just am curious what that means in terms of disk/file-descriptor usage. [15:27:05] I guess that's limited by replication threads though [15:27:36] ah, yeh [15:29:31] which I guess is 1 thread for each remote [15:29:34] currently [15:29:48] I doin't think it'll use alot of disk storage, also yeh i thin it'll be limited by how many threads we give replication (which by default is 1) [15:30:14] so i think it's safe to try this (if it does cause an issue, we can always revert it) [15:31:59] ok, sounds fine. I'm curious how quickly the queue will clear with 2288 projects. [15:32:29] 10Diffusion, 10Gerrit, 10Phabricator, 10Release-Engineering-Team-TODO (201908): Missing wmf/1.34.0-wmf.16 branch on rEOAT - https://phabricator.wikimedia.org/T229756 (10thcipriani) [15:33:45] I guess we should also try to bump up the replication plugin this week to the latest stable-2.15 [15:34:08] yeh [15:51:30] 10Beta-Cluster-Infrastructure, 10Mathoid, 10Operations, 10Core Platform Team Legacy (Watching / External), and 2 others: remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10Pchelolo) Is this done and can be resolved? There doesn't seem to be a mathoid installation on scb any longer [15:55:56] 10Beta-Cluster-Infrastructure, 10Mathoid, 10Operations, 10Core Platform Team Legacy (Watching / External), and 2 others: remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10akosiaris) 05Open→03Resolved a:03akosiaris I see `'mathoid' => 'http://deployment-docker-mathoid01.eqiad.wmfl... [16:18:39] (03PS5) 10Jforrester: Tarball creation [tools/release] - 10https://gerrit.wikimedia.org/r/521559 (https://phabricator.wikimedia.org/T217960) (owner: 10markahershberger) [16:19:54] (03PS1) 10Jforrester: Run CI for make-tarball-release again, once HHVM is no longer supported [tools/release] - 10https://gerrit.wikimedia.org/r/528191 [16:23:15] (03CR) 10jerkins-bot: [V: 04-1] Run CI for make-tarball-release again, once HHVM is no longer supported [tools/release] - 10https://gerrit.wikimedia.org/r/528191 (owner: 10Jforrester) [16:23:34] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team-TODO, 10Release Pipeline, 10Wikimedia-Portals: Migrate wikimedia-portals-build to Docker container - https://phabricator.wikimedia.org/T213806 (10thcipriani) To be clear: the current process should still work until the new proces... [16:24:44] 10Diffusion, 10Gerrit, 10Phabricator, 10Release-Engineering-Team-TODO (201908), 10Patch-For-Review: Missing wmf/1.34.0-wmf.16 branch on rEOAT - https://phabricator.wikimedia.org/T229756 (10thcipriani) p:05Triage→03Normal [16:47:43] 10Phabricator (Upstream), 10Upstream: Phab admins should be able to delete files that belong to other users via the web interface - https://phabricator.wikimedia.org/T168182 (10mmodell) @epriestley: I've implemented something similar in [[/source/phab-extensions/browse/wmf%252Fstable/src/workflow/RollbackTrans... [17:31:51] (03CR) 10Krinkle: "Perhaps for cases like libup and maybe others we can force a 'queue-name' to something that enforces no shared dependencies in the gate, s" [integration/config] - 10https://gerrit.wikimedia.org/r/526749 (owner: 10Jforrester) [17:32:34] (03CR) 10Jforrester: "> Patch Set 1:" [integration/config] - 10https://gerrit.wikimedia.org/r/526749 (owner: 10Jforrester) [19:12:58] (03PS1) 10Jforrester: layout: [mediawiki/services/chromium-render] Enable pipeline testing and publishing [integration/config] - 10https://gerrit.wikimedia.org/r/528226 (https://phabricator.wikimedia.org/T217114) [19:13:00] (03PS1) 10Jforrester: layout: [mediawiki/services/chromium-render] Drop node 6 testing [integration/config] - 10https://gerrit.wikimedia.org/r/528227 [19:13:02] (03PS1) 10Jforrester: jjb: Drop chromium-render-npm-browser-node-6-docker, unused [integration/config] - 10https://gerrit.wikimedia.org/r/528228 [19:19:42] (03CR) 10Jforrester: [C: 03+2] layout: [mediawiki/services/chromium-render] Enable pipeline testing and publishing [integration/config] - 10https://gerrit.wikimedia.org/r/528226 (https://phabricator.wikimedia.org/T217114) (owner: 10Jforrester) [19:22:01] (03Merged) 10jenkins-bot: layout: [mediawiki/services/chromium-render] Enable pipeline testing and publishing [integration/config] - 10https://gerrit.wikimedia.org/r/528226 (https://phabricator.wikimedia.org/T217114) (owner: 10Jforrester) [19:22:27] !log Zuul: [mediawiki/services/chromium-render] Enable pipeline testing and publishing T217114 [19:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:22:30] T217114: Migrate Proton to nodejs 10 - https://phabricator.wikimedia.org/T217114 [19:22:41] o/ hey releng folks. I was working with someone (I think thcipriani) a while back on a work-around for gerrit not mirroring correctly. [19:22:53] It looks like our workaround failed with this error "remote: error: commit 17fe069: email address noreply@github.com is not registered in your account, and you lack 'forge committer' permission." [19:23:14] Which is weird because we have always had github merge commits in our history. [19:23:33] Gerrit replication just changed, that might have broken it. [19:23:45] Tyler is your best person for that. [19:23:50] not in this instance, I'd recon [19:24:25] halfak: you're trying a manual push to gerrit to update the repo, correct? is this for editquality or a different repo? [19:24:39] halfak you need Forge Author Identity and Forge Committer Identity *i think* [19:24:39] articlequality at the moment, but I could try editquality now too. [19:24:54] well actually [19:25:00] Forge Committer Identity [19:25:05] per "and you lack 'forge committer' permission."" [19:25:20] last time, IIRC, I temporarily granted the forge committer permission [19:25:34] ahh. That'd be why the workaround doesn't work anymore then. [19:25:42] Temporary ran out. [19:25:45] :) [19:26:02] let me add that back in a second, I actually need to restart gerrit to pick up a config change first. [19:26:15] kk [19:26:34] I'll need this for editquality, articlequality, draftquality, and drafttopic. [19:35:24] thcipriani, should I try again? [19:36:32] halfak: I updated your permissions, finishing up the restart for new gerrit config changes, I'll let you know when it's back [19:36:42] kk thank you :) [19:37:14] thcipriani: BTW, does the GitHub sync now pull from the gerrit replica too? Thinking about load… [19:38:48] halfak: should be back now and you should have the perms you need [19:39:17] * halfak uploads half the internet [19:39:34] thcipriani, it worked. Thank you! [19:39:37] James_F: github is actually a push procedure from the main instance, I don't know if we could offload that, but that's an interesting thought :) [19:39:40] accraze, will ping when I have them all. [19:39:53] halfak: glad to hear it. happy to help :) [19:40:09] thcipriani: Ah, interesting. [19:44:00] thcipriani, "Upload denied for project 'scoring/ores/draftquality'" [19:44:12] Looks like I didn't need to push changes to that last time. [19:44:21] * halfak checks on drafttopic [19:44:21] * thcipriani checks that repo [19:45:13] It's working for drafttopic. [19:45:26] So I think draftquality is the only one that is hung. [19:45:31] * halfak doublechecks his own config. [19:45:40] ah, yeah, draftquality doesn't inherit from the "scoring" project -- any reason for it to be different? [19:46:06] Nope. It should. [19:46:30] * thcipriani updates [19:47:38] halfak: draftquality permission inheritance updated -- you should be good. [19:48:09] Off it goes! [19:49:50] thcipriani, could you make sure accraze has the rights to do this stuff as well? Or maybe you could point me to the place I could submit a patchset for that config. [19:51:05] halfak: if you could create a gerrit-permissions tagged task to add that person to the research-ores group in gerrit, that should be a sufficient enough audit trail [19:51:25] Cool. Will do. [19:51:28] thanks [19:52:20] "Gerrit-Privilege-Request"? [19:54:14] Aha! I seem to have been able to do it myself from here: https://gerrit.wikimedia.org/r/#/admin/groups/1166,members [19:54:54] (03CR) 10Jforrester: [C: 04-1] "Not yet." [integration/config] - 10https://gerrit.wikimedia.org/r/528227 (owner: 10Jforrester) [20:05:44] o/ The "wmfgerrit" bot seems to have closed all of our PRs on github [20:06:03] is that expected? [20:08:27] oh, no. cc thcipriani ^ [20:09:05] that is unexpected... [20:09:07] I am also here to complain about ^ [20:09:30] could we make wmfgerrit not an owner of the organization and instead give it access to only the repos that use it? [20:10:25] ugh, why would a replication update close pull requests? [20:10:54] joewalshwmf: or dbrant link to a closed pull request please? I'll see if I can figure out why that would happen... [20:11:18] https://github.com/wikimedia/wikipedia-ios/pull/3190 [20:11:44] oh [20:11:55] i bet theres a wikipedia/ios in gerrit [20:12:15] nope [20:12:17] im wrong [20:12:36] https://github.com/wikimedia/apps-android-wikipedia/pull/536 [20:12:56] well that's weird. [20:14:36] ^ [20:15:03] deleting branches for a repo it doesn't know about it is definitely surprising behavior... [20:15:28] it should only do that if it's mirror = true [20:16:52] github.com/wikimedia/apps-ios-wikipedia is a 301 to that repo [20:17:13] thcipriani i wonder if https://wikitech.wikimedia.org/wiki/Gerrit#Forcing_Replication_re-runs is related [20:17:28] thcipriani does the config have .mirror = true? [20:18:51] * thcipriani looks [20:20:48] so.. ehm.. when we merged "replicateOnStartup = true" i looked that up in docs and i noticed it said "replicates to all remotes on startup to ensure they are in-sync with this server" [20:20:58] of course https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/hieradata/role/common/gerrit.yaml#27 [20:21:33] oh [20:21:38] and then i thought: "_this_ server" but they will not mean to delete stuff that isn't on the replica [20:22:57] well we want that for servers that are meant to be actual mirrors like gerrit2001, but not for git [20:22:59] hub [20:25:19] as we now know, it closes all pull requests since pull-requests are evidently implemented as branches. [20:32:55] i think a new config should be added to allow you to choose which remotes should replicate on startup [20:35:51] so what happens if branches are actually deleted and should be deleted [20:35:57] should this https://gerrit.wikimedia.org/r/#/admin/projects/apps/ios be set to read only? [20:37:10] mutante it'll remove any branches on github (so if you create a pull, it'll delete refs/pull because that does not exist locally on cobalt for that repo) [20:38:44] thcipriani we should set replicatePermissions to false on the github mirror [20:38:47] enabled by default [20:39:05] that should prevent the mediawiki repo (which is a permission repo from being replicated) [20:39:40] +1 [20:41:50] thcipriani mutante https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528262/ [20:43:45] that should prevent the problem joewalshwmf and dbrant are having thcipriani [20:44:02] paladox: thcipriani: what about project deletions? [20:44:08] remote.NAME.replicateProjectDeletions : If true, project deletions will also be replicated to the remote site. [20:44:14] By default, false, do *not* replicate project deletions. [20:44:55] we have that set for the slave but haven't enabled it for github. [20:45:09] i doin't think that config will work with github? [20:45:21] also: how does that work with mirror? [20:46:27] I think its only made for the slave (but i haven't tested to see if it'll work with github) [20:48:08] paladox: I'm not sure that that will fix the issue. For instance apps-android-wikipedia is not a permissions-only project. We should have that option set to true, but I don't think it'll stop deleting pull requests on its own. [20:50:02] https://github.com/GerritCodeReview/plugins_replication/commit/ad923e4e1e4ae3923d15a8b4eee36379423846be is the commit that added project deletion support [20:50:40] thcipriani it checks the ref [20:50:41] https://github.com/GerritCodeReview/plugins_replication/blob/fe578665e97e9d7b082bce0b62879b598305729b/src/main/java/com/googlesource/gerrit/plugins/replication/Destination.java#L337 [20:50:59] https://gerrit.wikimedia.org/r/#/admin/projects/apps/ios,branches shows refs/meta/config [20:53:57] the problem is that pull-requests in github are implemented as branches on the project, and, because gerrit was treating github as a mirror, it deleted all those branches/closed the pull request. [20:54:24] not pushing config may fix apps-ios, but it wouldn't fix other repos with this problem. [20:54:29] yeh [20:55:25] +1 to the change though [20:56:23] do we want something else than "Not Found" on https://gerrit-slave.wikimedia.org/r/ [20:56:55] deploying paladox' change [20:56:58] heh, probably [20:57:05] thanks mutante ! [20:57:15] if you do ?polygerrit=1 [20:57:18] you get errors :P [20:57:57] ok, so now mirror is off, so we shouldn't be closing pull requests on restart. [20:57:58] i get a "Server Error: Not found" [20:58:10] yup [20:58:33] I would like a config that allows you to specify that replicationOnStartup only runs to gerrit-replica. [20:58:57] I can't figure out how to see what all pull requests wmfgerrit has closed :\ [20:58:59] thcipriani +1 [20:59:23] the organization audit log evidently doesn't keep track of things like that afaict. [21:00:05] filed https://bugs.chromium.org/p/gerrit/issues/detail?id=11280 [21:01:22] we can do that in puppet code [21:01:28] inside the .erb itself [21:01:55] oh.. or not.. i see now [21:06:14] thcipriani: wmfgerrit also deleted all of our branches, about 100. can we opt out of giving this bot admin access to our repo? as an "owner" of the Wikimedia github organization, it has admin access to the entire organization and all repos [21:10:26] joewalshwmf: ugh. I'm sorry that happened. The bot currently manages most of the repos of the organization where there aren't any updates on github so it makes sense to have it as an owner. We've turned off mirroring so it should no longer delete branches on github. If there's other configuration that makes sense in this situation I'm open to it, but removing it as owner seems like it might create a [21:10:27] lot of work to manage all other repos though admittedly I'm not sure how much work that would be. [21:16:38] * thcipriani emails wikitech [21:17:28] it looks like it would have to be manually set as an admin on new repos, or be part of a "team" that's added as an admin on all new repos [21:23:13] is there a way to have that set by default? i.e., this team is an admin for all new repos? [21:25:56] not sure - looking into it now [21:26:02] thanks [21:26:29] And no, owners of GitHub orgs can't be unset from being admins for repos in the org, AFAIAA. [21:28:44] (03CR) 10Jforrester: [C: 03+2] layout: [mediawiki/services/chromium-render] Drop node 6 testing [integration/config] - 10https://gerrit.wikimedia.org/r/528227 (owner: 10Jforrester) [21:29:27] (03CR) 10Jforrester: [C: 03+2] jjb: Drop chromium-render-npm-browser-node-6-docker, unused [integration/config] - 10https://gerrit.wikimedia.org/r/528228 (owner: 10Jforrester) [21:32:27] (03Merged) 10jenkins-bot: layout: [mediawiki/services/chromium-render] Drop node 6 testing [integration/config] - 10https://gerrit.wikimedia.org/r/528227 (owner: 10Jforrester) [21:32:52] it looks like it's not possible to have a team as admin on new repos by default [21:34:06] (03Merged) 10jenkins-bot: jjb: Drop chromium-render-npm-browser-node-6-docker, unused [integration/config] - 10https://gerrit.wikimedia.org/r/528228 (owner: 10Jforrester) [21:35:09] we could blacklist repos thcipriani [21:35:11] i think [21:35:18] based on my reading here https://github.com/GerritCodeReview/plugins_replication/blob/master/src/main/resources/Documentation/config.md [21:35:19] in replication/config? [21:35:22] yeh [21:35:27] "remote.NAME.projects : Specifies which repositories should be replicated to the remote. It can be provided more than once, and supports three formats: regular expressions, wildcard matching, and single project matching. All three formats match case-sensitive. [21:35:27] " [21:35:34] you can use regex [21:36:05] Projects may be excluded from replication by using a regular [21:36:06] expression with inverse match. `^(?:(?!PATTERN).)*$` will [21:36:06] exclude any project that matches. [21:36:47] nice find :) [21:37:07] :) [21:43:57] so i think the regex would be projects: '^(?:(?!apps\/ios).)*$' [21:45:55] yeap, seems right: https://regex101.com/r/ZYFWD3/1 [21:46:15] :) [21:51:50] email sent to wikitech-l in case we missed other cases of fallout. [21:54:28] thanks thcipriani! [21:54:40] cc legoktm (gerrit-replica is live :)) [22:23:49] (03CR) 10Jeena Huneidi: local-charts: CLI for managing minikube, helm, etc (033 comments) [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4) [22:38:01] (03PS1) 10Umherirrender: Create new sniff for doc comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/528280 [23:28:19] (03CR) 10Krinkle: [C: 03+1] "Make sure that it does not (wrongly) fix this case which I manually fixed - https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/528199/2.." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/528280 (owner: 10Umherirrender)