[00:08:45] 10Release-Engineering-Team (Doing), 10Security-Team, 10GitLab (CI & Job Runners), 10User-brennen: Limit GitLab shared runners to trusted contributors - https://phabricator.wikimedia.org/T292094 (10Tgr) `mediawiki/extensions/*` and ṁediawiki/skins/*` have plenty of projects which have nothing to do with Wik... [00:17:49] 10Release-Engineering-Team (Doing), 10Security-Team, 10GitLab (CI & Job Runners), 10User-brennen: Limit GitLab shared runners to trusted contributors - https://phabricator.wikimedia.org/T292094 (10brennen) Some probably incomplete responses to T292094#7442614 below - I'm going to be AFK for a couple of day... [01:50:18] legoktm: Should we/I switch the CI images using PHP 7.4 over to use Wikimedia's package instead of sury's [02:24:25] James_F: once the packages exist, yes :) but I didn't get that far yet [02:28:20] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Release 1.37.0-rc.0 - https://phabricator.wikimedia.org/T289591 (10Jdforrester-WMF) [02:31:01] legoktm: Ack. Should I file a task or should we just do it as part of the main task? [02:37:54] I think a dedicated task would be good [02:38:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, once it exists - https://phabricator.wikimedia.org/T293851 (10Jdforrester-WMF) [02:39:00] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, once it exists - https://phabricator.wikimedia.org/T293851 (10Jdforrester-WMF) [02:39:01] {{done}} [02:39:22] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, to assure us that it will work - https://phabricator.wikimedia.org/T293851 (10Jdforrester-WMF) [02:45:01] (03PS1) 10Jforrester: [DNM] Docker: [php74] Switch PHP 7.4 from Sury to Wikimedia package [integration/config] - 10https://gerrit.wikimedia.org/r/732112 (https://phabricator.wikimedia.org/T293851) [03:01:43] Project mwcore-phpunit-coverage-master build #1714: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/1714/ [03:07:48] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [03:34:38] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [04:42:53] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Seddon) {F34699940} >>! In T281169#7437764, @Jdlrobson wrote: > @thcipriani I've missed a few train log triages, but we did say that newl... [04:56:59] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Seddon) {F34699940} >>! In T281169#7437764, @Jdlrobson wrote: > @thcipriani I've missed a few train log triages, but we did say that newl... [06:57:22] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) 05In progress→03Resolved The timers are up and running: ` name=systemctl list-... [07:26:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10ci-test-error (WMF-deployed Build Failure): TAR_ENTRY_ERROR ENOSPC: no space left on device - https://phabricator.wikimedia.org/T292729 (10hashar) 05Resolved→03Open >>! In T292729#7425651, @thcipriani wrote: >>>! In T292729#74... [08:50:41] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10hashar) Thank you @Seddon for the verification. I have triaged logs this morning and it looks really quiet. I will push to group1 slightly... [10:51:16] 10Release-Engineering-Team (Radar), 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Joe) 05Open→03Resolved [11:29:03] 10Release-Engineering-Team (Radar), 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Joe) 05Resolved→03Open Sadly I found a problem with our current approach: any file under static/... [12:28:42] hrmmm, it seems like https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/ is pretty unhappy...? builds seem to fail a lot with something along the lines of "14:03:02 mw-error.log:2021-10-20 11:59:23 e0f413250f75 wikidb: [f0869ff019aeabef7b9b6e01] /index.php/Special:UserLogin PHP Notice: Undefined index: rc_logid" at the beginning of the faillog... [12:44:16] ihurbain: Looks like it's on various special pages [12:44:28] >13:19:08 mw-error.log:2021-10-20 12:16:37 6286a5ec50e3 wikidb: [554b008f1810c0fa8441a7ac] /index.php/Selenium_talk_test PHP Notice: Undefined index: rc_logid [12:44:30] And non special pages [12:46:27] As of the last few hours it seems [12:46:42] @Reedy should I file something? [12:48:05] Might aswell... [12:48:13] I can't see any obvious patches to MW core that would explain it [12:49:27] @Reedy it's the same job than https://phabricator.wikimedia.org/T292729 which was apparently a thing this morning, might be related? [12:50:05] Hmm [12:50:15] It would seem odd an out of disk event resulting in an undefined index [12:50:15] (well, "a thing". touched, at least, it seems) [12:50:18] heh [12:50:20] yeah.... [12:50:35] anyway. filing independently, we'll see what happens. [12:57:07] 10Continuous-Integration-Infrastructure, 10Jenkins, 10ci-test-error: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10ihurbain) [12:57:18] boom. [12:57:33] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Browser-Tests, 10ci-test-error (WMF-deployed Build Failure): wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10Reedy) p:05Triage→03High [12:57:46] I think it might almost be UBN worthy [12:57:58] Question is whether it fails every time on the same patch(es) [12:58:34] i must say i haven't tried to restart mine - i looked at the logs, saw it was happening on other builds around it, went "hrm, this is probably not on me" :P [12:59:07] (and probably not that sporadic either, although some do seem to pass) [12:59:18] Yeah, it's definitely not every job failing [13:00:07] (thanks for retagging) (love the phab avatar :P ) [13:12:53] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10User-brennen: runner-1002 is out of space - https://phabricator.wikimedia.org/T291221 (10thcipriani) [13:13:28] 10Release-Engineering-Team (Next), 10GitLab (Administration, Settings & Policy), 10User-brennen: GitLab project templates include issues - figure out if these can be customized or removed - https://phabricator.wikimedia.org/T290612 (10thcipriani) a:05jeena→03None [13:13:53] 10Release-Engineering-Team (Next), 10GitLab (Project Migration), 10User-brennen: Migrate mediawiki/tools/release/ to GitLab - https://phabricator.wikimedia.org/T290260 (10thcipriani) a:05dancy→03None [13:14:19] 10Release-Engineering-Team (Next), 10Code-Health, 10Developer Productivity, 10GitLab (Integrations), 10User-brennen: Investigate whether we can/should integrate Git/Reviewers with GitLab - https://phabricator.wikimedia.org/T289712 (10thcipriani) a:05dduvall→03None [13:19:52] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Browser-Tests, 10ci-test-error (WMF-deployed Build Failure): wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10zeljkofilipin) [13:29:33] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Browser-Tests, 10ci-test-error (WMF-deployed Build Failure): wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) p:05High→03Unbreak! This seems to be happening every time, and it's preve... [13:31:44] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 3 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) [13:32:05] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 3 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) Maybe caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikib... [13:39:53] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 3 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) a:03matmarex Error message looks very similar to T182938. Maybe it also n... [13:43:52] Lucas_WMDE: hi about Wikibase using a now deprecated ParserOutput::setProperty ( T293860 ), I haven't marked it as a blocker of this week train. I just filtered it out given it is probably super trivial to adjust the usage and it is sjust a deprecation ;) [13:43:53] T293860: PHP Deprecated: Use of ParserOutput::setProperty was deprecated in MediaWiki 1.38. [Called from Wikibase\Client\Hooks\ShortDescHandler::doHandle] - https://phabricator.wikimedia.org/T293860 [13:43:55] so no rush! [13:45:30] it's logspam though [13:45:45] In numerous different extensions [13:46:06] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10hashar) `ParserOutput::getProperty` and `ParserOutput::setProperty` have been marked deprecated and a few code path paths still hit them. I... [13:49:57] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 4 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) (Currently waiting for Jenkins's response on https://gerrit.wikimedia.org/r... [14:06:19] hashar: ok [14:06:32] it should be fine, since the methods just call the non-deprecated ones afaict [14:06:41] but maybe I’ll backport a fix later today [14:06:43] Lucas_WMDE: yes [14:07:10] I have poked cscott from the Parser team who has done the deprecation. I am pretty sure he will come with follow up patch to fix them all [14:07:25] feel free to add him as a reviewer to your Wikibase patch and backport ;) [14:09:54] it turned out in one place we were mocking Parser::getOutput() to return an OutputPage instead of a ParserOutput [14:10:03] so if the rename was intended to disambiguate those two confusingly similar classes [14:10:11] then I agree it’s a good thing in general :D [14:19:30] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 4 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly - https://phabricator.wikimedia.org/T293885 (10matmarex) Seems to fix the issue, at least for https://gerrit.wikimedia.org/r/c/media... [14:40:05] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 4 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly: Undefined index: rc_logid - https://phabricator.wikimedia.org/T293885 (10Nikerabbit) [14:54:03] (03PS1) 10Zfilipin: Add Outreachy round 23 applicant to trusted users [integration/config] - 10https://gerrit.wikimedia.org/r/732353 (https://phabricator.wikimedia.org/T256626) [14:56:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [15:01:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [15:02:00] Project mwcore-phpunit-coverage-master build #1715: 04STILL FAILING in 1 min 59 sec: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/1715/ [15:03:27] (03CR) 10Hashar: [C: 03+2] Add Outreachy round 23 applicant to trusted users [integration/config] - 10https://gerrit.wikimedia.org/r/732353 (https://phabricator.wikimedia.org/T256626) (owner: 10Zfilipin) [15:06:11] (03Merged) 10jenkins-bot: Add Outreachy round 23 applicant to trusted users [integration/config] - 10https://gerrit.wikimedia.org/r/732353 (https://phabricator.wikimedia.org/T256626) (owner: 10Zfilipin) [15:06:16] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10hashar) @Seddon and @Jdlrobson I tried to load the mw-client-NEW-errors dashboard at https://logstash.wikimedia.org/app/dashboards#/view/AX... [15:10:47] ouch, lots of changes in gate-and-submit at the moment :/ [15:11:23] indeed [15:21:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [15:29:52] 10Release-Engineering-Team (Seen), 10MobileFrontend: Add tests to verify initial collapsing of sections - https://phabricator.wikimedia.org/T263843 (10Jdlrobson) [15:32:08] Apologies if this has been discussed, but is there a workaround for this issue with Gitlab CE? https://gitlab.com/gitlab-org/gitlab-foss/-/issues/32943 [15:33:24] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 4 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly: Undefined index: rc_logid - https://phabricator.wikimedia.org/T293885 (10hashar) Out of curiosity, how has it been possible to get... [15:34:07] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url command - https://phabricator.wikimedia.org/T287042 (10jeena) [15:37:16] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 4 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly: Undefined index: rc_logid - https://phabricator.wikimedia.org/T293885 (10Lucas_Werkmeister_WMDE) @Michael and I speculated that thi... [15:38:09] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Jdlrobson) >>! In T281169#7444583, @hashar wrote: > @Seddon and @Jdlrobson I tried to load the mw-client-NEW-errors dashboard at https://lo... [15:41:40] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10Andrew) a:03aborrero [15:42:02] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10Andrew) +1 approved [15:42:27] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10nskaggs) +1 [15:55:44] (03PS3) 10Pwirth: parameter_functions: fix dependencies for BlueSpiceReadConfirmation [integration/config] - 10https://gerrit.wikimedia.org/r/731902 [16:03:35] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10Observability-Logging, 10User-brennen: Experiment with automating error log triage - https://phabricator.wikimedia.org/T290267 (10colewhite) [16:03:37] 10Release-Engineering-Team, 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite) [16:09:41] 10Release-Engineering-Team, 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite) [16:30:34] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [16:40:20] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10hashar) @Jdlrobson thank you for The ParserOutput:getProperty deprecations ( T293860 T293895 T293894 ) all had patches, they are in the... [16:42:09] (03PS1) 10Ahmon Dancy: Add stuff to build the webserver image [tools/release] - 10https://gerrit.wikimedia.org/r/732379 [16:43:03] (03CR) 10Ahmon Dancy: [C: 03+2] Add stuff to build the webserver image [tools/release] - 10https://gerrit.wikimedia.org/r/732379 (owner: 10Ahmon Dancy) [16:44:33] (03Merged) 10jenkins-bot: Add stuff to build the webserver image [tools/release] - 10https://gerrit.wikimedia.org/r/732379 (owner: 10Ahmon Dancy) [16:49:27] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10dduvall) I just realized I undercounted the memory requirements for each executor. It should be 6G each. C... [16:50:02] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10dduvall) [16:51:55] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10dduvall) >>! In T293832#7445174, @dduvall wrote: > I just realized I undercounted the memory requirements... [17:00:17] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:02:00] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, and 5 others: wmf-quibble-selenium-php72-docker jobs failing repeatedly: Undefined index: rc_logid - https://phabricator.wikimedia.org/T293885 (10matmarex) 05Open→03Resolved Changes seem to be merging... [17:15:36] 10Release-Engineering-Team, 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite) >>! In T293694#7438791, @Jdlrobson wrote: > Perhaps https://grafana.wikimedia.org/d/0000005... [17:18:48] 10Release-Engineering-Team, 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite) p:05Triage→03Medium [17:23:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, to assure us that it will work - https://phabricator.wikimedia.org/T293851 (10Legoktm) This would also be a good time to enable PHP 7.4 jobs on wm... [17:25:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10Patch-For-Review: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, to assure us that it will work - https://phabricator.wikimedia.org/T293851 (10thcipriani) Let #together know when that patch is no longer D... [17:26:58] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10thcipriani) [17:29:32] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Done by Thu 04 Nov): Request access to beta cluster for Lucas Werkmeister (Cloud VPS project deployment-prep), non-staff account - https://phabricator.wikimedia.org/T293559 (10thcipriani) p:05Triage→03Medium a:03thcipriani [17:30:17] 10Deployments, 10Release-Engineering-Team (Next): During holiday: add the automatic train branch as a 'window' in the deployments calendar so people can be aware of it happening - https://phabricator.wikimedia.org/T293101 (10thcipriani) p:05Triage→03Low [17:36:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul, 10Release Pipeline: Separate zuul queue for pipelinelib publish jobs - https://phabricator.wikimedia.org/T292130 (10thcipriani) [17:37:21] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Done by Thu 04 Nov), 10Zuul, 10Release Pipeline: Separate zuul queue for pipelinelib publish jobs - https://phabricator.wikimedia.org/T292130 (10thcipriani) [17:38:50] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Zuul, 10observability: Migrate Zuul alerting to Grafana / AlertManager - https://phabricator.wikimedia.org/T292284 (10hashar) 05Stalled→03Open Thank you @CDanis for noticing! Indeed it did trigger and again today: ` lang=irc... [17:38:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Zuul, 10observability: Migrate Zuul alerting to Grafana / AlertManager - https://phabricator.wikimedia.org/T292284 (10hashar) a:03hashar [17:39:40] 10Phabricator, 10Release-Engineering-Team (Done by Thu 04 Nov): "Project report" Age Distribution query links to individual weeks lack project tag and task status parameters - https://phabricator.wikimedia.org/T291710 (10mmodell) a:03mmodell [17:40:52] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url command - https://phabricator.wikimedia.org/T287042 (10thcipriani) We should break this one into subtasks @jeena can you take a look? [17:41:18] 10Release-Engineering-Team (Done by Thu 04 Nov), 10dev-images, 10mwcli, 10User-brennen: Add php-luasandbox to dev-images used by mwcli - https://phabricator.wikimedia.org/T286678 (10thcipriani) [17:41:22] 10Phabricator, 10Release-Engineering-Team (Done by Thu 04 Nov): "Project report" Age Distribution query links to individual weeks lack project tag and task status parameters - https://phabricator.wikimedia.org/T291710 (10mmodell) This is a straightforward and simple bugfix. [17:45:21] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners): Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10thcipriani) [17:46:19] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (Auth & Access), 10User-brennen: Reproduce GitLab 2fa failures - https://phabricator.wikimedia.org/T293528 (10thcipriani) [17:57:34] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10thcipriani) [18:05:13] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners): Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10dduvall) a:03dduvall [18:10:07] (03PS1) 10Ahmon Dancy: Install k9s in deploy container [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732394 [18:29:17] 10Phabricator: Create personal workboard for User-STei (WMF) - https://phabricator.wikimedia.org/T293920 (10STei-WMF) [18:32:55] 10Project-Admins: Create personal workboard for User-STei (WMF) - https://phabricator.wikimedia.org/T293920 (10JJMC89) [18:33:07] 10Project-Admins: Create personal workboard for User-STei (WMF) - https://phabricator.wikimedia.org/T293920 (10JJMC89) [18:33:11] 10Phabricator (Upstream), 10Upstream: Per-user projects for personal work in progress tracking - https://phabricator.wikimedia.org/T555 (10JJMC89) [18:49:04] 10Continuous-Integration-Config: Run PHP 7.4 as well as PHP 7.2 on wmf branch patches - https://phabricator.wikimedia.org/T293924 (10Reedy) [18:49:15] 10Continuous-Integration-Config: Run PHP 7.4 as well as PHP 7.2 on wmf branch patches - https://phabricator.wikimedia.org/T293924 (10Reedy) [18:49:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Seen), 10Patch-For-Review: Re-build all CI images of PHP 7.4 from sury's package to Wikimedia's one, to assure us that it will work - https://phabricator.wikimedia.org/T293851 (10Reedy) [18:53:40] 10Continuous-Integration-Config: Also run PHP 7.4 jobs on wmf branch patches - https://phabricator.wikimedia.org/T293924 (10Reedy) [19:00:31] 10Gerrit, 10Release-Engineering-Team (Radar), 10Discovery, 10Discovery-Search: Update gerrit submit type for discovery repositories in gerrit - https://phabricator.wikimedia.org/T255509 (10EBernhardson) p:05Triage→03Medium a:03EBernhardson [19:01:03] 10Gerrit, 10Release-Engineering-Team (Radar), 10Discovery, 10Discovery-Search (Current work): Update gerrit submit type for discovery repositories in gerrit - https://phabricator.wikimedia.org/T255509 (10EBernhardson) [19:06:54] hi folks, looks like CI gate & submit is backed up due to a stuck job on this patch. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/722967 you can see the jenkins job stopped about 25 mins ago here https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php74-docker/7738/console [19:07:37] I aborted it [19:07:58] thanks!! [19:08:01] thank you Reedy !!! [19:15:43] 10Gerrit, 10Release-Engineering-Team (Radar), 10Discovery, 10Discovery-Search (Current work): Update gerrit submit type for discovery repositories in gerrit - https://phabricator.wikimedia.org/T255509 (10EBernhardson) Changed `search/` and `wikimedia/discovery` parents to `Rebase if necessary` with `Allow... [19:19:43] hmm did gate & submit just get aborted / restarted again? [19:20:13] we've got two patches that keep getting retested [19:24:57] Which two? [19:31:09] lol, k9s might as well have been called "revenge-of-the-off-by-one" [19:32:16] dancy: what's your view on T266055 on terms of who's waiting for who to do what? [19:32:17] T266055: Update Scap to perform rolling restart for all MW deploy - https://phabricator.wikimedia.org/T266055 [19:32:28] * Krinkle noticed another opcache bug being rolled in today [19:40:33] Krinkle: At this point I'm waiting for someone to freshly advocate for flipping the switch. Sounds like that's what you're doing. If you don't mind, please add a new comment. I'll get it added to my sprint. [19:40:57] (ideally saying something about how the costs are worth it) [19:41:52] and of course we can always flip the switch back if we're not satisfied. [19:41:58] Reedy: we got it working thanks! [19:43:44] dancy: by costs you mean the added minute of sync time? [19:43:53] yeah, and the increase in HTTP 500's [19:45:32] right, I meant to follow up on that. From the chatlog it sounded like Effie sayd it was normal in the current situation to have errors during sync. Maybe they are normal, but I don't understand why they would happen. I do understand why some errors would happen in the new proposed model given restarts and cancelling requests, although that should be far lower under normal conditions. There's no reason to kill all on-going requests when [19:45:32] the p75 is well under 200ms [19:45:51] It was afaik always understood that it would be a graceful restart, which is more or less the point of the depool. [19:46:01] seems like something there might not be working as intended. [19:46:21] dancy: were the results of the trial posted on phab? [19:46:26] I don't see them on that task [19:46:46] ah I see it now, not in so much detail, but that's enough [19:46:53] hrm.. looks like we didn't capture the graphs [19:47:05] We are in train window so we could do another sync-file README now [19:47:20] [20:00:11] (03PS2) 10Jforrester: docs: Add example of selective docker-pgk rebuild to README [integration/config] - 10https://gerrit.wikimedia.org/r/730948 (owner: 10Krinkle) [20:01:13] (03PS3) 10Jforrester: docs: Add example of selective docker-pgk rebuild to README [integration/config] - 10https://gerrit.wikimedia.org/r/730948 (owner: 10Krinkle) [20:01:15] (03CR) 10Jforrester: docs: Add example of selective docker-pgk rebuild to README (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/730948 (owner: 10Krinkle) [20:01:22] (03CR) 10Ahmon Dancy: docs: Add example of selective docker-pgk rebuild to README (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/730948 (owner: 10Krinkle) [20:01:25] (03CR) 10Jforrester: [C: 03+1] docs: Add example of selective docker-pgk rebuild to README [integration/config] - 10https://gerrit.wikimedia.org/r/730948 (owner: 10Krinkle) [20:10:34] twentyafterfour: ok, back [20:10:40] 10Release-Engineering-Team (Radar), 10Scap, 10Patch-For-Review, 10User-jijiki: Update Scap to perform rolling restart for all MW deploy - https://phabricator.wikimedia.org/T266055 (10Krinkle) >>! In T266055#7275900, @dancy wrote: > We ran the test today. @jijiki supplied SRE backup. > > It ran in two pha... [20:11:14] dancy: ok, we could gather some numbers yeah. We can do 2x status quo and 2x with restarts at some point today. [20:12:50] ok. cramming some food into my face at the moment. ~30 mins from now? [20:12:51] IIRC it was something something jobrunners was how the 500s were explained, but I could be misremembering [20:14:31] dancy: good idea (late lunch and 30min) [20:14:55] 👍🏾 [20:29:14] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10Patch-For-Review: Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10dduvall) Puppet patches are tested and merge. `runner-1002.gitlab-runners.eqiad1.wikimedia.clou... [20:29:25] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10Patch-For-Review: Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10dduvall) 05Open→03Stalled [20:30:07] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10Patch-For-Review: Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10dduvall) [20:30:13] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10dduvall) [20:30:45] Is https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page in an unexpected state? We rely on it for testing code on the train but it seems to be in some kind of weird alpha which is not running latest master [20:33:01] https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version says it is running on b031325, which is the current master [20:33:08] Jdlrobson: what does weird alpha mean [20:33:59] So 1.38.0-alpha is the latest version? Okay, I'll keep investigating [20:34:05] Jdlrobson: yes [20:34:12] https://en.m.wikipedia.beta.wmflabs.org/wiki/Spain is not showing a logo [20:34:23] which was an old regression so I'm not sure why it's resurfaced [20:34:28] I purged the page so commit was right [20:35:02] there's still no team that tries to maintain the beta cluster, so it may break often even if you rely on it [20:35:09] Jdlrobson: I can't see a logo on any page [20:35:49] I see green beta cluster text but no wordmark [20:35:54] yeh that's a bug. I just wasn't sure whether 1.38.0-alpha was some kind of special release [20:36:16] No [20:36:40] Pretty sure it's always been that format [20:37:02] thanks for confirming. [20:39:04] It's definitely been since MW_VERSION & 1.35 [20:39:19] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10bd808) [20:39:20] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10Patch-For-Review: Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10bd808) [20:39:26] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (CI & Job Runners), 10Patch-For-Review: Provide separate/larger volume for /var/lib/docker on GitLab runners - https://phabricator.wikimedia.org/T293835 (10bd808) [20:39:30] 10Release-Engineering-Team (Radar), 10Cloud-VPS (Quota-requests), 10GitLab (CI & Job Runners): Request increased quota for gitlab-runners Cloud VPS project - https://phabricator.wikimedia.org/T293832 (10bd808) [20:40:32] bd808: oops. thanks for fixing that relation :) [20:41:00] yw [20:43:16] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Jdlrobson) [20:43:45] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Jdlrobson) THe logo has disappeared on some mobile sites so please do not deploy further until T290525 is taken care of. [20:51:41] Jdlrobson: is that not rollback worthy? [20:51:45] We've used "alpha" as the in-development semver version of MW since at least 2004 (MW 1.5alpha). You'll find the same on your localhost wiki and in DefaultSettings.php (or now Defines.php) [20:52:09] Also, would testing not be better before group1? [20:52:33] Krinkle: I guessed a very long time [21:02:43] @James_F has something changed with how we run storybook? I've got sh: 1: build-storybook: not found in various storybook repos blocking merges now. [21:03:16] FetchError: Invalid response body while trying to fetch https://registry.npmjs.org/@storybook%2fhtml: [21:03:23] or maybe npm? [21:03:33] I remember you and @Krinkle might have been touching something related to that this week? [21:03:57] I'm regenerating package-lock in case that's the issue [21:04:16] (patch is https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/732436) [21:04:56] Spookreeeno: regarding testing before group 1, yes ideally, but our QA engineer was sick this week and our train doesn't wait for sick people :) [21:09:44] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) db2078.mgmt mw2253.mgmt [21:10:03] It says the "build-storybook" command, as invoked from package.json doesn't exist. This wouldn't be anything affected by how we run stuff in CI afaics. Would generlaly be an issue with the code in the repo, and the depencies it uses. trying to reproducible it locally would be a good first step (naturally with the same container as CI, e.g. quibble or fresh-node). [21:12:29] I generally read CI output top down from the first error message, not the last. [21:13:01] the first warning is from npm about the lock file being from an old npm version (npm< 7, instead of npm 7+), so that's indeed good thing to update now that we're on node 12 / npm 7. [21:13:19] Then after that, the FetchError is likely an intermittent issue with npmjs.org that a retry should rectify. [21:14:30] 10Release-Engineering-Team, 10MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), 10Readers-Web-Backlog (Kanbanana-FY-2021-22): CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 (10Jdlrobson) [21:14:39] 10Release-Engineering-Team, 10MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), 10Readers-Web-Backlog (Kanbanana-FY-2021-22): CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 (10Jdlrobson) p:05Triage→03Unbreak! [21:15:03] if not, then I guess that's a bug in npm 7 with npm-cli making an invalid requests to its own server as part of the lock file migration code, which could be reported upstream. [21:15:06] (03CR) 10Jeena Huneidi: [C: 03+2] Install k9s in deploy container [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732394 (owner: 10Ahmon Dancy) [21:15:32] (03Merged) 10jenkins-bot: Install k9s in deploy container [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732394 (owner: 10Ahmon Dancy) [21:28:50] (03PS1) 10Dduvall: Remove previously generated certs as part of clean subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732442 [21:39:04] 10Phabricator, 10Project-Admins: Archive the User-Zabe project - https://phabricator.wikimedia.org/T293304 (10Zabe) [21:39:35] 10Project-Admins: Archive the User-Zabe project - https://phabricator.wikimedia.org/T293304 (10Zabe) [21:40:35] Jdlrobson: if you're stuck then im happy to lend a hand because you're really good at writing test plans in executing them [21:41:52] 10Project-Admins: Archive the User-Zabe project - https://phabricator.wikimedia.org/T293304 (10Peachey88) 05Open→03Resolved a:03Peachey88 {{done}} [21:43:14] (03PS1) 10Dduvall: Use resolved container IDs instead of hardcoded names [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732444 [21:44:17] (03CR) 10Ahmon Dancy: [C: 03+2] Remove previously generated certs as part of clean subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732442 (owner: 10Dduvall) [21:44:42] (03Merged) 10jenkins-bot: Remove previously generated certs as part of clean subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732442 (owner: 10Dduvall) [21:46:12] (03CR) 10Ahmon Dancy: [C: 03+1] "LGTM. Will wait for Jeena to confirm that it works for her." [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732444 (owner: 10Dduvall) [21:52:44] (03CR) 10Jeena Huneidi: [C: 03+2] "working for me!" [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732444 (owner: 10Dduvall) [21:53:09] (03Merged) 10jenkins-bot: Use resolved container IDs instead of hardcoded names [tools/train-dev] - 10https://gerrit.wikimedia.org/r/732444 (owner: 10Dduvall) [21:59:58] 10Release-Engineering-Team, 10MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), 10Readers-Web-Backlog (Kanbanana-FY-2021-22): CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 (10Jdlrobson) p:05Unbreak!→03High Doesn't seem to be impacting wmf5 as the d... [22:02:22] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [22:10:22] 10Release-Engineering-Team (Radar), 10Scap, 10Patch-For-Review, 10User-jijiki: Update Scap to perform rolling restart for all MW deploy - https://phabricator.wikimedia.org/T266055 (10dancy) >>! In T266055#7445971, @Krinkle wrote: > Also, given the use of poolcounter for the rolling restart and how the int... [22:14:16] 10Project-Admins: Create "Radar" milestones for #Research and #Machine-Learning-Team project tags - https://phabricator.wikimedia.org/T283538 (10leila) @Aklapper sorry for my delay. I swear I stared at this task for many times and I can't find a good path forward. ;) Your explanations above are helpful. I think... [22:14:33] 10Project-Admins: Create "Radar" milestones for #Research and #Machine-Learning-Team project tags - https://phabricator.wikimedia.org/T283538 (10leila) 05Open→03Declined [22:17:09] 10Release-Engineering-Team (Radar), 10Scap, 10Patch-For-Review, 10User-jijiki: Update Scap to perform rolling restart for all MW deploy - https://phabricator.wikimedia.org/T266055 (10dancy) The test command for the record: `scap sync-file -D php_fpm_always_restart:true README` [23:26:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Done by Thu 04 Nov): Request access to beta cluster for Lucas Werkmeister (Cloud VPS project deployment-prep), non-staff account - https://phabricator.wikimedia.org/T293559 (10thcipriani) 05Open→03Resolved Hey @LucasWerkmeister added you as a proje... [23:33:12] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Done by Thu 04 Nov): Request access to beta cluster for Lucas Werkmeister (Cloud VPS project deployment-prep), non-staff account - https://phabricator.wikimedia.org/T293559 (10LucasWerkmeister) SSH access seems to work fine, thanks! Project member shou... [23:59:41] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Jdlrobson) [23:59:51] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T281169 (10Jdlrobson) No longer blocked. Thanks @thcipriani