[00:11:45] (03PS4) 10Jforrester: Zuul: [avro-php] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/860061 (https://phabricator.wikimedia.org/T278569) [00:14:07] (03CR) 10Jforrester: [C: 03+2] Zuul: [avro-php] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/860061 (https://phabricator.wikimedia.org/T278569) (owner: 10Jforrester) [00:16:32] (03Merged) 10jenkins-bot: Zuul: [avro-php] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/860061 (https://phabricator.wikimedia.org/T278569) (owner: 10Jforrester) [03:17:15] 10Release-Engineering-Team, 10MediaWiki-extensions-Gadgets, 10Security-Team, 10Security: Allow Javascript files from Wikimedia GitLab to be loaded as scripts in Wikimedia wikis - https://phabricator.wikimedia.org/T321458 (10Tgr) (Adding #release-engineering-team per @mmartorana's comment.) [05:49:14] 10Phabricator, 10DBA, 10SRE, 10decommission-hardware, and 2 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Marostegui) That's ok Daniel, I will take care of it on this task. [05:49:37] 10Phabricator, 10DBA, 10SRE, 10decommission-hardware, and 2 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Marostegui) I will merge that change and then proceed and remove grants live [05:57:28] 10Phabricator, 10DBA, 10SRE, 10decommission-hardware, and 3 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Marostegui) ` root@db1159.eqiad.wmnet[(none)]> select user,host from mysql.user where host like '10.64.16.8'; +----------------+------------+ | User... [06:00:22] 10Phabricator, 10DBA, 10SRE, 10decommission-hardware, and 3 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Marostegui) All done from the DBA side. [06:23:25] 10Phabricator, 10Data-Persistence (work done), 10SRE, 10decommission-hardware, and 3 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Marostegui) [07:03:08] 10Project-Admins, 10User-Slst2020: Create Toolhunt project tag for Outreachy internship project - https://phabricator.wikimedia.org/T324317 (10Slst2020) [07:50:23] 10GitLab (Project Migration), 10Machine-Learning-Team, 10ORES: Migrate ORES/Revscoring/etc. repos to Gitlab or Gerrit - https://phabricator.wikimedia.org/T264651 (10elukey) 05Openβ†’03Declined Setting this to Declined for the moment, please re-open if needed :) [08:13:40] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Blocking 🧱): Request access to deployment-prep for GMikesell - https://phabricator.wikimedia.org/T320974 (10Aklapper) [08:30:04] 10Release-Engineering-Team, 10MediaWiki-extensions-Gadgets, 10Security-Team, 10Security: Allow Javascript files from Wikimedia GitLab to be loaded as scripts in Wikimedia wikis - https://phabricator.wikimedia.org/T321458 (10Bawolff) . > >> Could we set the raw Javascript resources to have `Access-Control-... [08:57:38] Krinkle: any reason Fresh support for BROWSERSTACK* and SAUCE* variables only made it to node16? https://gerrit.wikimedia.org/r/c/fresh/+/837230 [08:58:26] I am wondering if all those fresh-node## files could potentially be refactored toward a single file [09:33:23] 10Release-Engineering-Team, 10MediaWiki-extensions-Gadgets, 10Security-Team, 10Security: Allow Javascript files from Wikimedia GitLab to be loaded as scripts in Wikimedia wikis - https://phabricator.wikimedia.org/T321458 (10Tgr) >>! In T321458#8449810, @Bawolff wrote: > This is possibly bordering into offt... [09:34:47] PROBLEM - Host contint1001 is DOWN: PING CRITICAL - Packet loss = 100% [09:53:07] hashar: jnuche: Is that expected? ^ [09:55:42] looks like that executor can't be reached by the controller, I'll defer to hashar on that [09:56:21] console is hung [09:57:07] ok to powercycle? [09:59:31] claime: nothing showing up? :-\ [09:59:42] I guess the underlying hardware is faulty somehow, it is an old machine [10:00:11] it smells like we will have to replace it [10:00:44] Yeah, I'm powercycling it rn [10:00:55] but it seems to have a potentially bad dimm [10:01:06] dimm == une barette mΓ©moire? [10:01:12] yep [10:01:24] and it's out of warranty [10:01:31] looks like godog powercycled out on Monday [10:02:27] even if out of warranty can we possibly swap the faulty dimm with a new one? [10:02:33] RECOVERY - Host contint1001 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [10:02:49] (then giving the machine it is old, it might not be worth sending someone on site to do the change) [10:02:58] I don't know what are the policies on that front [10:03:06] Me neither [10:14:32] in a .lfsconfig file I added: [10:14:33] +[lfs] [10:14:34] +url=https://gitlab.wikimedia.org/repos/releng/jenkins-deploy.git/info/lfs [10:23:27] yeah that's right, host keeps crashing :( [10:29:27] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10Clement_Goubert) contint1001 crashed again today, bad DIMM, had to powercycle it from iDRAC. [10:39:10] seems to me like the host is a lemon at this point heh [10:40:29] it is a bit of a mess [10:41:00] looks like we had a replacement task for both hosts and already have hardware for them (respectively contint2002.wikimedia.org and contint1002.wikimedia.org [10:42:58] contint2002 apparently got created back in May :\ https://phabricator.wikimedia.org/T299575 [10:43:25] *nod* good stuff though hw is already here, way simpler [10:46:30] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint2002 service implementation tracking - https://phabricator.wikimedia.org/T324659 (10hashar) [10:47:26] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) [10:48:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) [10:48:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Radar), 10SRE, 10ops-codfw, 10serviceops-radar: contint2001.mgmt disappeared from Icinga - https://phabricator.wikimedia.org/T298861 (10hashar) [10:49:15] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint2002 service implementation tracking - https://phabricator.wikimedia.org/T324659 (10hashar) [10:50:59] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint1002 service implementation tracking - https://phabricator.wikimedia.org/T313832 (10hashar) [10:51:19] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint1002 service implementation tracking - https://phabricator.wikimedia.org/T313832 (10hashar) [10:51:37] what a mess [10:53:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) [10:53:56] 10Continuous-Integration-Infrastructure, 10SRE, 10serviceops-collab: contint2002 service implementation tracking - https://phabricator.wikimedia.org/T324659 (10hashar) [10:55:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) I have rebalanced the Phabricator task tree so we now have this task at the root, followed by task for implementing the services: *... [10:58:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops-collab: contint1002 service implementation tracking - https://phabricator.wikimedia.org/T313832 (10hashar) contint1001 keeps crashing due to a faulty memory stick. It happened on October 31st ( T294276#8357385 ) and ag... [11:15:54] (03PS1) 10Slyngshede: Spelling, coobooks -> cookbooks [integration/docroot] - 10https://gerrit.wikimedia.org/r/865598 [11:49:06] (03CR) 10Hashar: [C: 03+2] "Oups! Thank you for the spellchecking" [integration/docroot] - 10https://gerrit.wikimedia.org/r/865598 (owner: 10Slyngshede) [11:49:42] (03Merged) 10jenkins-bot: Spelling, coobooks -> cookbooks [integration/docroot] - 10https://gerrit.wikimedia.org/r/865598 (owner: 10Slyngshede) [11:51:39] (03CR) 10Hashar: [C: 03+2] "deployed. The page might not refresh immediately, there is a one hour cache time to live." [integration/docroot] - 10https://gerrit.wikimedia.org/r/865598 (owner: 10Slyngshede) [11:58:57] (03PS1) 10Reedy: zuul/parameter_functions.py: Make NSFileRepo depend on EnhancedUpload [integration/config] - 10https://gerrit.wikimedia.org/r/865612 [11:59:19] (03CR) 10Reedy: [C: 03+2] zuul/parameter_functions.py: Make NSFileRepo depend on EnhancedUpload [integration/config] - 10https://gerrit.wikimedia.org/r/865612 (owner: 10Reedy) [12:01:03] (03Merged) 10jenkins-bot: zuul/parameter_functions.py: Make NSFileRepo depend on EnhancedUpload [integration/config] - 10https://gerrit.wikimedia.org/r/865612 (owner: 10Reedy) [12:01:43] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/865612 [12:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:49:22] hashar: less testing on my part. They'll be removed soon enough :) [12:49:31] Keeps the old ones stable I guess [13:17:49] 10Release-Engineering-Team, 10Scap: scap git LFS only works for submodules - https://phabricator.wikimedia.org/T324664 (10hashar) [13:17:55] 10Release-Engineering-Team, 10Scap: scap git LFS only works for submodules - https://phabricator.wikimedia.org/T324664 (10hashar) For ORES. On the deployment server There is no `.lsconfig`. Git LFS is 2.7.1 ` $ git -C /srv/deployment/ores/deploy lfs env|grep Endpoint Endpoint=https://gerrit.wikimedia.org/r/... [13:23:54] Krinkle: good to know. Last week I wanted to do a /usr/lib/fresh file that would be sourced from the various bin/fresh-node## [13:24:10] with the few differences being added before the sourcing, such as the container image to use [13:24:21] I will skip and wait for obsolescence instead :-] [13:25:25] I might add support for people that have `docker` symlinked to `podman` this way we would no more have to use `fresh-node -podman` [13:25:58] (the rough idea is `docker version` output contains `Podman` then enable the `-podman` switch) [13:28:54] 10Phabricator: Custom task form for #Wikimedia Enterprise (pre-filled template) - https://phabricator.wikimedia.org/T322167 (10Aklapper) 05Openβ†’03Resolved a:03Aklapper Great. :) Closing this for the time being, please feel free to reopen/comment if you need any help or changes! [14:36:03] hashar: afaik people don't need to use -podman, it is supposed to detect podman automatically [14:36:08] -podman exists as optoin to force it [14:44:39] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [14:45:40] 10Beta-Cluster-Infrastructure, 10Cassandra, 10Beta-Cluster-reproducible, 10User-zeljkofilipin: Can not log in, log out, or save edits to the beta cluster (session failures) - https://phabricator.wikimedia.org/T324128 (10Eevans) 05Openβ†’03Resolved Looks good; (Re)closing. [14:53:11] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Blocking 🧱): Request access to deployment-prep for GMikesell - https://phabricator.wikimedia.org/T320974 (10GMikesell-WMF) @thcipriani my account name for Wikitech is `GeorgeMikesell`. Thanks! [14:54:57] Is something going on with CI? Zuul has a lot of queued jobs, but nothing's running. [14:55:39] did it crash again? [14:56:03] I thought the relevant server was power cycled just earlier today due to the same issue [14:58:14] on the other hand, one of the ContentTranslation patches is still running [14:58:24] and also GrowthExperiments in gate-and-submit [14:59:48] Nope, contint1001 is up [15:00:15] hashar: Argh, how do I manually trigger gerrit -> GitHub replication? `ssh -p 29418 gerrit.wikimedia.org replication start --wait` from https://wikitech.wikimedia.org/wiki/Gerrit/Administration#Github fails for me with 'startReplication for plugin replication not permitted' [15:00:18] (and yes, I did power cycle it earlier) [15:01:17] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Security: Jenkins plugins security advisory - 2022-11-15 - https://phabricator.wikimedia.org/T323054 (10hashar) The release Jenkins has the same issue. It reports: | JUnit Plugin 1119.1121.vc43d0fc45561 | Pipeline Utility Steps 2.8.0 | Script Security Pl... [15:01:25] ok, thanks for checking claime [15:01:57] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [15:02:02] I see some more patches running now, so I think it might just be that there are lots of patches in the queue zeljkof [15:02:06] 10Project-Admins, 10Content-Transform-Team: "Parsoid Tracking tasks" in Phabricator vs Content Transform Team - https://phabricator.wikimedia.org/T324679 (10Aklapper) [15:02:08] Ah well [15:02:12] There's the alert [15:02:14] James_F: you might require an Administrative rights for that [15:02:18] or some other right [15:02:25] Ah, fair. [15:02:46] James_F: usually sending a change/patchset/doing a comment is enough to trigger a replication [15:02:51] hashar: In that case can you run `ssh -p 29418 gerrit.wikimedia.org replication start avro-php --wait` for me? I forgot that old releases still use the library and people import via composer. [15:03:04] hashar: Archived repo that I closed too quickly. [15:03:20] doing [15:03:26] Thanks! [15:03:41] Replicate avro-php ref ..all.. to gerrit2002.wikimedia.org, Succeeded! (OK) (that is serving gerrit-replica.wikimedia.org typically used by packagist) [15:03:46] Then I need to find a human with packagist access to fix things. [15:03:48] Replicate avro-php ref ..all.. to github.com, Succeeded! (OK) [15:03:53] Thanks hashar! [15:05:29] 10Gerrit, 10commit-message-validator, 10Patch-For-Review: Avoid enforcing arbitrary header order - https://phabricator.wikimedia.org/T324316 (10dcaro) Let's see how it goes (might take a while, so I'm already using the modified git locally): https://lore.kernel.org/git/pull.1438.git.1670423522572.gitgitgadge... [15:05:51] zuul jumped from 4 waiting jobs to 656 in 1 minute [15:05:59] zeljkof: usually Zuul/CI is overloaded when there are too many changes send [15:06:06] It's going down now, so I think it just took a good whack of changes [15:06:13] and there is a condition which requests way too many merge requests which tend to overload the system [15:06:16] that is what trigger the alarm [15:06:37] https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10&from=now-30m&to=now [15:06:44] the bump of request is shown at the bottom of https://integration.wikimedia.org/zuul/ in the "Gearman job queue" graph [15:06:54] and claime link is the same view ;) [15:07:32] the reason this time is the chain ending at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/865664 [15:08:11] PROBLEM - jenkins_service_running on releases1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:09:11] RECOVERY - jenkins_service_running on releases1002 is OK: PROCS OK: 1 process with regex args .*/bin/java .*-jar /usr/share/java/jenkins.war https://wikitech.wikimedia.org/wiki/Jenkins [15:09:15] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Security: Jenkins plugins security advisory - 2022-11-15 - https://phabricator.wikimedia.org/T323054 (10hashar) I have applied the updates and restarted the Release Jenkins. [15:09:39] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [15:29:39] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [15:31:31] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [16:18:31] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Blocking 🧱): Request access to deployment-prep for GMikesell - https://phabricator.wikimedia.org/T320974 (10thcipriani) 05Openβ†’03Resolved πŸ‘‹ @GMikesell-WMF you should now be a member of deployment-prep. Let me know if you have any problems or questi... [16:53:11] 10GitLab (Integrations), 10GitLab-Test, 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), and 2 others: Experiment with GitLab-Phabricator integration - https://phabricator.wikimedia.org/T265617 (10dancy) [16:53:31] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), 10User-brennen: Comment on Phabricator tasks for new, merged, and abandoned changes on GitLab - https://phabricator.wikimedia.org/T324150 (10dancy) 05Openβ†’03Resolved a:03dancy We have something basic depl... [16:53:43] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), 10User-brennen: Comment on Phabricator tasks for new, merged, and abandoned changes on GitLab - https://phabricator.wikimedia.org/T324150 (10dancy) [16:54:12] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), 10User-brennen: Sandbox task for gitlab-phabricator comment integration - https://phabricator.wikimedia.org/T324164 (10dancy) 05Openβ†’03Resolved a:03dancy Done testing. [16:55:11] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), 10User-brennen: Comment on Phabricator tasks for new, merged, and abandoned changes on GitLab - https://phabricator.wikimedia.org/T324150 (10dancy) 05Resolvedβ†’03Open Still need to do the Patch-for-review st... [16:55:19] 10GitLab (Integrations), 10GitLab-Test, 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), and 2 others: Experiment with GitLab-Phabricator integration - https://phabricator.wikimedia.org/T265617 (10dancy) [16:55:48] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ), 10User-brennen: Comment on Phabricator tasks for new, merged, and abandoned changes on GitLab - https://phabricator.wikimedia.org/T324150 (10dancy) [16:57:20] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab III: GitLab in LA πŸͺƒ): Provision Horizontal Pod Autoscaler (HPA) for GitLab cloud runners - https://phabricator.wikimedia.org/T323164 (10dancy) 05Openβ†’03Resolved [17:39:25] 10Beta-Cluster-Infrastructure: beta-prometheus.wmflabs.org 502 Bad Gateway - https://phabricator.wikimedia.org/T324695 (10TheresNoTime) [17:40:17] !log Restarted zuul-merger on contint2001.wikimedia.org [17:40:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:45:48] 10Beta-Cluster-Infrastructure: beta-prometheus.wmflabs.org 502 Bad Gateway - https://phabricator.wikimedia.org/T324695 (10TheresNoTime) a:03TheresNoTime `lang=sh [samtar@deployment-deploy03 ~]$ ping deployment-prometheus02.deployment-prep.eqiad1.wikimedia.cloud PING deployment-prometheus02.deployment-prep.eqia... [17:53:20] 10Beta-Cluster-Infrastructure: beta-prometheus.wmflabs.org 502 Bad Gateway - https://phabricator.wikimedia.org/T324695 (10TheresNoTime) Okay, expected per {T306068}: >>! In T306068#8356941, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-cloud), href=https://sal.toolforge.org/log/q53CLoQB8F... [18:03:35] (03CR) 10Hashar: Support Needed-By as a backlink to Depends-On (031 comment) [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/862233 (owner: 10Daniel Kinzler) [18:26:31] 10Release-Engineering-Team, 10MediaWiki-extensions-Gadgets, 10Security-Team, 10Security: Allow Javascript files from Wikimedia GitLab to be loaded as scripts in Wikimedia wikis - https://phabricator.wikimedia.org/T321458 (10Lectrician1) > Less performant (no ResourceLoader minification or client-side store... [18:41:40] 10Phabricator, 10Data-Persistence (work done), 10SRE, 10decommission-hardware, and 3 others: decommission phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T323418 (10Dzahn) Thank you @Marostegui , perfect :) [18:44:34] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10Dzahn) thanks @fgiunchedi and @Clement_Goubert . I will follow-up on the hardware issue and with dcops. [18:49:01] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: contint1001 hardware failures - https://phabricator.wikimedia.org/T324698 (10Dzahn) [18:49:54] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10Dzahn) There is T313832 already which is about setting up the replacement for this, contint1002. Also there is now T324698 to e... [18:50:57] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: contint1001 hardware failures - https://phabricator.wikimedia.org/T324698 (10Dzahn) [18:52:45] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: contint1001 hardware failures - https://phabricator.wikimedia.org/T324698 (10Dzahn) purchase date: 2016 not under warranty and replacement is already here so there is no point in trying to get the RAM replaced afaict we can turn this... [19:03:27] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab, 10Patch-For-Review: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10Dzahn) [19:05:54] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10Dzahn) T313832 , T324698 and T294276 in general have higher prio now and cover this. This ticket can stay about a possible swi... [19:06:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) [19:07:28] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) I merged your change https://gerrit.wikimedia.org/r/c/operations/puppet/+/865672/4 so now releng members have shell access on contin... [19:08:05] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops-collab, 10Patch-For-Review: contint1002 service implementation tracking - https://phabricator.wikimedia.org/T313832 (10Dzahn) I merged your change https://gerrit.wikimedia.org/r/c/operations/puppet/+/865672/4 so now... [19:26:54] 10Beta-Cluster-Infrastructure: beta-prometheus.wmflabs.org 502 Bad Gateway - https://phabricator.wikimedia.org/T324695 (10Andrew) >>! In T324695#8451639, @TheresNoTime wrote: > Is `deployment-prometheus02` getting replaced? I don't know; most of the other Stretch VMs got rebuilt but there wasn't what I would c... [20:02:21] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: contint1001 hardware failures (remove contint1001 from production) - https://phabricator.wikimedia.org/T324698 (10hashar) @Dzahn can we stick to {T313832} for the implementation. I am fine having a task for decommissioning contint1001 but... [20:09:23] 10Release-Engineering-Team (Seen), 10serviceops, 10serviceops-collab: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10hashar) 05Openβ†’03Declined In 2020 we have switched the service from eqiad to codfw and this task was to switch back to eqiad... [21:17:05] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T320518 (10TheresNoTime) Hi, around the time this rolled to group 1, {T324711} started β€” ~4k exceptions an hour [21:17:43] !log Add contint1002 as an agent to the CI Jenkins, albeit in offline mode cause it is being provisioned | https://integration.wikimedia.org/ci/computer/contint1002/ | T313832 [21:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:17:46] T313832: contint1002 service implementation tracking - https://phabricator.wikimedia.org/T313832 [21:20:39] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T320518 (10Ladsgroup) I can be one of {T320534} or {T320529} or {T322672} instead of wmf.13 [21:25:45] !log gerrit: on next restart it will started with Java property `-Dh2.maxCompactTime=15000` and on the next shutdown that would cause it to compact the oversized H2 database files | https://gerrit.wikimedia.org/r/c/operations/puppet/+/865023/ | T323754 [21:25:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:25:48] T323754: Investigate Gerrit h2 cache being way too large - https://phabricator.wikimedia.org/T323754 [22:01:02] Anyone around with dev-images merge / publish rights? https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/23 would let us move most regular devs to PHP 8.1, which will find bugs with PHP 8.1 a lot faster so we can move to prod… [22:32:49] 10Phabricator: Request to add MusikAnimal to acl*phabricator - https://phabricator.wikimedia.org/T324716 (10MusikAnimal) [22:34:47] 10Phabricator: Request to add MusikAnimal to acl*phabricator - https://phabricator.wikimedia.org/T324716 (10MusikAnimal) [22:42:23] James_F: yeah two shakes [22:42:33] brennen: Awesome. [22:42:38] sorry i missed that [22:42:43] No worries. [22:51:14] With blubber is there a way to copy a file into somewhere that in $PATH ? E.g. I'm trying to install kubectl, when I use "copies" (https://pastebin.mozilla.org/4yLtYprK) I get the error "destination: path must be relative when "from" is "local": Key: 'Config.variants[deploy].copies[0].destination' Error:Field validation for 'destination' failed on the 'relativelocal' tag" [22:52:32] i kinda suspect the answer is kubectl should be installed from a package or something? but then is there a package with the right version in the right place: mmmmm [22:54:02] There is an apt packaged version, https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management, but it requires a public key to be trusted, which I don't see a way to do in the blubber documentation. [22:59:44] !log Updating development images on contint primary to release php 8.1 images for T319432 [22:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:59:47] T319432: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 [23:09:07] dduvall / jeena - any thoughts for kindrobot on this one? [23:10:18] 10Beta-Cluster-Infrastructure: File on betacommons shows usage on production and links nonexistent beta project - https://phabricator.wikimedia.org/T301997 (10matmarex) 05Openβ†’03Resolved I got access to the beta cluster (also known as https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep) – thank... [23:15:52] kindrobot: i have a couples of ideas but i am afk at the moment. I’ll respond as soon as I’m back [23:19:04] The tldr is probably:look at one of our blubber.yaml files to see how we do it https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/blob/29cd8d104098ff4178cca99a2a0d945f9cce1e20/.pipeline/blubber.yaml [23:20:05] Or how to copy into a PATH dir https://gitlab.wikimedia.org/repos/releng/gitlab-terraform-images/-/blob/wmf/stable/.pipeline/blubber.yaml#L18 [23:46:06] kindrobot: https://github.com/wikimedia/toolhub/blob/main/.pipeline/blubber.yaml#L131-L155 might be doing a similar thing. My destination there is not in a default $PATH, but I think it could target something like /usr/local/bin just as easily as /srv/dockerize. [23:54:09] Thank you both, I'm done with work for the day, but I'll try those tomorrow and report back. :)