[00:01:01] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [00:04:46] 10Project-Admins: Create project for MontserratFont extension - https://phabricator.wikimedia.org/T194579 (10Urbanecm) [01:01:05] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [01:16:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [02:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [02:16:47] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<20.00%) [04:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:34:25] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Wikidata, 10Wikidata Query UI: Migrate wikidata-query-gui-build to Docker containers - https://phabricator.wikimedia.org/T210286 (10hashar) [05:34:31] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Release Pipeline, 10Wikimedia-Portals: Migrate wikimedia-portals-build to Docker container - https://phabricator.wikimedia.org/T213806 (10hashar) [05:34:44] 10Continuous-Integration-Infrastructure (Slipway), 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Analytics: Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10hashar) [05:47:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908), 10Zuul: zuul is being very slow (2019-08-24) - https://phabricator.wikimedia.org/T231136 (10greg) 05Open→03Invalid There was a spike of jobs: https://grafana.wikimedia.... [05:49:38] 10Release-Engineering-Team, 10Product-Analytics, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs? - https://phabricator.wikimedia.org/T230743 (10greg) Release Engineering recommends and support Gerrit code review, is that what you wanted? https://www.mediawiki.org/wiki/Ge... [05:52:59] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10Fundraising-Backlog: Create composer-test-php70 docker image for fundraising tech's crm tests - https://phabricator.wikimedia.org/T230446 (10greg) [06:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:46:48] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:39:39] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [08:11:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [08:26:28] 10Continuous-Integration-Config, 10Tracking-Neverending: Add CI to all Gerrit repositories - https://phabricator.wikimedia.org/T180317 (10hashar) [08:26:33] 10Continuous-Integration-Config, 10Gerrit, 10Tools, 10Patch-For-Review: Add CI to all labs/tools/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180318 (10hashar) 05Open→03Resolved a:03hashar Low hanging fruits had been resolved at time. Then it is a never ending task to... [08:26:35] 10Continuous-Integration-Config, 10Tracking-Neverending: Add CI to all Gerrit repositories - https://phabricator.wikimedia.org/T180317 (10hashar) [08:26:37] 10Continuous-Integration-Config, 10Tracking-Neverending: Add CI to all Gerrit repositories - https://phabricator.wikimedia.org/T180317 (10hashar) 05Open→03Resolved a:03hashar Low hanging fruits had been resolved at time. Then it is a never ending task to add CI and we would need a better process to have... [08:26:39] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [08:26:40] 10Continuous-Integration-Config, 10Operations, 10Patch-For-Review: Add CI to all operations/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180330 (10hashar) 05Open→03Resolved a:03hashar Low hanging fruits had been resolved at time. Then it is a never ending task to add CI... [08:28:40] 10Continuous-Integration-Config, 10MediaWiki-extensions-NSFileRepo: NSFileRepo depends upon Lockdown extension - https://phabricator.wikimedia.org/T185610 (10hashar) 05Open→03Resolved a:03Pwirth I guess that is solved so :) [08:30:17] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO: Use cron instead of Jenkins for beta deployments - https://phabricator.wikimedia.org/T188367 (10hashar) p:05Triage→03Low [08:30:55] 10Continuous-Integration-Config, 10MediaWiki Language Extension Bundle: Run php5 tests for MLEB extensions - https://phabricator.wikimedia.org/T197561 (10hashar) 05Open→03Declined PHP 5 is gone. [08:34:20] 10Continuous-Integration-Config: Make Jenkins jobs fail if part of setup-zuul-submodules fails - https://phabricator.wikimedia.org/T199077 (10hashar) We had the issue since the first time we proceeded submodules. In short because we did something like: ` find -name .gitmodules -execdir 'git submodule update --in... [08:34:35] 10Release-Engineering-Team (Kanban), 10Quibble, 10Patch-For-Review: Quibble should errors out when a git submodule fails - https://phabricator.wikimedia.org/T198980 (10hashar) [08:34:37] 10Continuous-Integration-Config: Make Jenkins jobs fail if part of setup-zuul-submodules fails - https://phabricator.wikimedia.org/T199077 (10hashar) [08:37:20] 10Continuous-Integration-Config: Add yourself to the Jenkins whitelist in Gerrit to trigger testing unit test failures or code style issues yourself - https://phabricator.wikimedia.org/T200778 (10hashar) 05Open→03Resolved The task was for #google-code-in-2018 and is no more relevant. [08:56:47] 10Continuous-Integration-Config, 10MediaWiki Language Extension Bundle: Run php5 tests for MLEB extensions - https://phabricator.wikimedia.org/T197561 (10Nikerabbit) Latest MLEB also requires PHP7 now. [09:48:08] 10Continuous-Integration-Config: Fetch dependencies using composer instead of cloning mediawiki/vendor for non-wmf branches - https://phabricator.wikimedia.org/T90303 (10hashar) [09:50:14] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO: integration-slave-jessie-1004 puppet error - https://phabricator.wikimedia.org/T218361 (10hashar) 05Open→03Resolved a:03hashar apt issue is gone, probably as part of some other... [09:50:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO: integration-slave-jessie-1004 puppet error - https://phabricator.wikimedia.org/T218361 (10hashar) [09:50:44] 10Continuous-Integration-Infrastructure: integration slaves: puppet errors - PHP package install problems - https://phabricator.wikimedia.org/T219625 (10hashar) [10:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [10:23:05] 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Operations, 10serviceops: Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar) I am Back from vacations! CI currently runs 18.06. 18.09 introduces a bunch of changes I am not comfortable to... [10:48:22] PROBLEM - Free space - all mounts on deployment-mediawiki-07 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki-07.diskspace.root.byte_percentfree (<100.00%) [11:19:39] hashar: Welcome back btw. I hope you had a nice vacation [12:05:03] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 5 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [12:06:55] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 5 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [12:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [12:41:54] PROBLEM - SSH on integration-slave-docker-1048 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:45] PROBLEM - SSH on integration-slave-docker-1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:03:36] RECOVERY - SSH on integration-slave-docker-1050 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u8 (protocol 2.0) [13:23:43] (03PS1) 10Ladsgroup: Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) [13:24:56] Can someone merge and deploy this? ^ [13:25:05] for a UBN [13:25:27] (03CR) 10Alaa Sarhan: [C: 03+1] Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [13:29:38] (03PS2) 10Ladsgroup: Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) [13:30:00] (03CR) 10Reedy: [C: 03+2] Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [13:30:16] (03CR) 10Lucas Werkmeister (WMDE): "commenting it out (and mentioning the task ID in a comment) would be better than just removing the lines IMHO" [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [13:30:28] meh, I was too slow [13:30:41] feel free to halt the merge [13:30:45] nah [13:30:54] (03CR) 10Reedy: [C: 03+2] Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [13:30:56] I can send another change to add them back, commented-out ^^ [13:31:53] Thanks Reedy and Lucas_WMDE [13:32:57] Let me know when it's deployed so I recheck something [13:34:41] CI is backed up [13:34:45] So it'll take ages to merge :P [13:40:24] <_joe_> hi [13:40:43] <_joe_> jenkins is broken since hours. Every patch takes forever to be checked [13:40:52] <_joe_> can I ask if someone's even looking into it? [13:41:42] <_joe_> to be clear, something that takes 8 to 20 minutes to execute a CI run for puppet, where the actual run takes less than one minute, is broken from my POV [13:42:40] 8-20 minutes? [13:42:47] it's 2 hours for some mediawiki patches [13:44:15] 10Release-Engineering-Team, 10Zuul: CI performance issues - https://phabricator.wikimedia.org/T231200 (10Vgutierrez) [13:44:43] <_joe_> Reedy: it's still 95% of the time spent idling [13:46:31] puppet runs are normally fast, 1-2 minutes. longer than that means something is broken, 20 minutes means something extremely broken [13:46:34] 10Release-Engineering-Team, 10Zuul: CI performance issues - https://phabricator.wikimedia.org/T231200 (10Joe) p:05Triage→03Unbreak! For context, the actual time to run the tests for operations/puppet is under one minute for most patches. Either Zuul or jenkins are broken, and this has been a constant pain... [13:46:44] mw is a whole different deal [13:48:05] <_joe_> yeah let's please find out where the problem is. I'm happy to help if you need a root for something. [13:48:14] (03Merged) 10jenkins-bot: Drop Scribunto as dependency of Wikibase for now [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [13:51:15] 10Release-Engineering-Team, 10Zuul: CI performance issues - https://phabricator.wikimedia.org/T231200 (10Ladsgroup) Related {T231198} [13:51:27] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/532379 [13:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:51:48] RECOVERY - SSH on integration-slave-docker-1048 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u8 (protocol 2.0) [13:53:53] the gate-and-submit queue looks dreadful [13:54:44] https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/532343/ over two hours?! [13:55:24] and SRE are complaining about tens of minutes ;P [13:58:34] one note, it seems php7.2 suddenly got really slow [13:58:34] puppet is the glue that holds the rest together though. if that glue gets too far behind.... [13:58:49] did we do anything on php7.2? [13:59:46] <_joe_> Amir1: maybe you changed the code and found some pathological path [14:00:04] <_joe_> that's indeed worrisome and should be investigated [14:00:15] I doubt that, every test got way slower in php7.2 [14:00:29] it used to be one of the fastest tests [14:02:46] did we deploy new version of quibble? I know deploying that would make things faster (I hope) [14:11:02] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [14:12:38] i don't see anything in -operations or here or SAL for the last few days about a new version [14:14:10] <_joe_> Amir1: that's a question for releng [14:14:34] Amir1: If you're looking in console logs to determine runtime, there's also a debug line which includes the quibble image version, e.g. "quibble-stretch-php72:0.0.34-1" [14:14:51] yeah but in general things started to become way slower, like phpunit without database tests jumped from 5 minutes to 13 minutes over the weekend [14:15:14] awight: good point [14:17:11] Amir1: FWIW, I'm reading one sample log https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-php72-docker/4196/consoleFull and phpunit-databaseless is still at 5min [14:17:29] Which logs show 13 minutes? [14:17:47] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/67485/consoleFull [14:17:51] This is 16 minutes [14:19:15] +1 how strange. and entirely different slow tests on the leaderboard [14:19:21] 16 minutes for tests that don't even need database seems a little bit excessive [14:20:44] That's using HHVM, so the problem isn't just php7.2 I guess [14:21:45] There are no special tests in quibble per PHP flavour that aren't in the repos you're testing. [14:22:08] So if it runs slow on one PHP executor but not others, you have the power to fix that, not RelEng. [14:24:17] I think you're blaming a symptom (Scribunto) rather than a cause (too much complexity in Wikibase that leads to poor test performance), bluntly. [14:24:27] James_F: Well, we don't +2 rights on integration config + ability to deploy it [14:24:55] Amir1: Indeed. But that's not the repo I'm talking about. [14:25:33] It would be nice to run a sampling profiler during CI... [14:25:36] James_F: I highly doubt that wikibase is the problem here, It jumped from 20-40m to more than an hour over the weekend, no code in wikibase can do that [14:25:48] Amir1: Of course it can. [14:25:55] all tests equally got slower [14:26:10] Amir1: The split of Wikibase into WikibaseCirrusSearch added about 10% extra time to everyone's tests everywhere, for instance. [14:26:42] Code in the gate has to be especially careful about what tests it adds. [14:26:47] such things hasn't happened in the past couple of days AFAIK [14:27:03] It's an endless series of straws on the camel's back. [14:27:12] A year ago it was ~15 minutes. [14:30:34] 10Release-Engineering-Team, 10Wikidata, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour in php7.2 - https://phabricator.wikimedia.org/T231198 (10Jdforrester-WMF) Not a shared build failure; isolated to Lexeme and things that depend on it. [14:31:47] James_F: I agree that the whole thing needs to be rethought but for the issue at hand, In Friday, for gate-submit we had "quibble-vendor-mysql-php70-docker https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/9952/console : SUCCESS in 30m 09s" https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/456524 and now we have "quibble-vendor-mysql-php70-docker [14:31:47] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/27927/console : SUCCESS in 55m 31s" https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/532088 [14:32:11] OK, there are four sources of slowness in CI: [14:32:12] the 90% jump in run time didn't come from wikibase code [14:32:16] * Your code [14:32:20] * Your dependencies' code [14:32:30] * Cloud Services' infrastructure [14:32:41] * Something in the RelEng framework running the tests wrongly. [14:32:59] I'm pointing out that you're jumping to the conclusion that it's option 4, despite nothing having changed there. [14:33:15] Whereas I can see two dozen patches to MW core alone that landed over the last three days. [14:34:14] Eyeballing this trend, it actually seems like the trouble started earlier today: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/buildTimeTrend [14:35:10] The tests for Lexeme seem to have been the source of the slowness (possibly caused by Scribunto). No significant patches have landed in Scribunto in a week, just a redirect bypass for GenderCache. [14:35:58] WikibaseLexeme has a few patches, some of which look a little scary but none are obviously at fault. [14:36:51] Similarly with WikibaseLexemeCirrusSearch and Wikibase itself, and WikibaseCirrusSearch hasn't had any patches in three weeks. [14:38:31] 10Continuous-Integration-Config, 10Wikidata, 10ci-test-error: Jenkins has been running Scribunto tests in Wikibase patches since 2015 - https://phabricator.wikimedia.org/T228739 (10Jdforrester-WMF) [14:38:57] 10Continuous-Integration-Config, 10Wikidata, 10ci-test-error: Jenkins has been running Scribunto tests in Wikibase patches since 2015 - https://phabricator.wikimedia.org/T228739 (10Jdforrester-WMF) Note: as of today, this was 'temporarily' disabled for slowness: https://gerrit.wikimedia.org/r/c/integration/c... [14:42:34] 10Continuous-Integration-Config, 10Wikidata, 10ci-test-error: Jenkins has been running Scribunto tests in Wikibase patches since 2015 - https://phabricator.wikimedia.org/T228739 (10Ladsgroup) >Jenkins has been running Scribunto tests in Wikibase patches since 2015. This is not correct, I have been monitorin... [14:46:44] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Jdforrester-WMF) [14:46:45] Actually the issue started from Friday [14:46:52] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/531490 [14:47:02] quibble-vendor-mysql-php70-docker https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/27634/console : SUCCESS in 28m 36s [14:47:06] 10Continuous-Integration-Config, 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations: Fix operations/puppet.git "rebase hell" - https://phabricator.wikimedia.org/T224033 (10hashar) [14:50:28] Amir1: IMHO that's not clear, try reading down the durations in the /buildTimeTrend execution time pane... [14:51:25] Maybe that's a bad way to analyze, since it's jumbling together jobs from many repos. [14:51:44] Yeah, you only care about the runs for Wikibase.git and WikibaseLexeme.git, FWICT. [14:51:55] 10Release-Engineering-Team, 10Wikidata, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour in php7.2 - https://phabricator.wikimedia.org/T231198 (10Ladsgroup) The issue actually started in Friday: - https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/27634/con... [14:52:42] 10Release-Engineering-Team, 10Wikidata, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour in php7.2 - https://phabricator.wikimedia.org/T231198 (10Ladsgroup) Patches: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/531490 and https://gerrit.wikimedia.org/r/c/mediaw... [14:53:49] For now, I'm just comparing https://phabricator.wikimedia.org/T231198#5437902 [14:56:48] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO: Create mirror of Gerrit repositories for consumption by various tools - https://phabricator.wikimedia.org/T226240 (10hashar) Apparently extdist is still reaching out to gerrit.wikimedia.org over https from at least:... [15:01:15] To me it might be an issue WMCS infra [15:08:27] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10ValueView, 10Wikidata, and 2 others: Fix the data-values/value-view repo to work on node10 - https://phabricator.wikimedia.org/T229276 (10Jdforrester-WMF) a:05Jdforrester-WMF→03None [15:08:29] * paladox has finally fixed the issue that made us do https://github.com/wikimedia/puppet/commit/cb61b2f6bb4b69ebda4fcdda7f59be0c482efaca [15:08:34] https://gerrit-review.googlesource.com/c/gerrit/+/234732 [15:10:07] 10Release-Engineering-Team-TODO (201908), 10Lexicographical data, 10MediaWiki-Core-Testing, 10Wikidata, 10ci-test-error: WikibaseLexeme test broken by refactor of MediaWiki's Language class - https://phabricator.wikimedia.org/T231103 (10Jdforrester-WMF) 05Open→03Resolved [15:15:40] Amir1: it is most probably due to mediawiki/core change https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/507579/ "Make LocalisationCache a service" :D [15:15:50] Amir1: -Imean the slowness in jobs- [15:16:07] but it is not a straightforward revert, some other changes got pilled on top of that and I have a couple meeting I have to attend [15:16:17] but potentially one can try reverting the chain of patches and see whether the tests run faster [15:16:48] if they get faster again, that means the "Make LocalisationCache a service" change has issues [15:16:53] else, it is something else entirely [15:17:20] Thanks. I'm having lunch rn. Can someone do it for me? [15:22:34] PROBLEM - Free space - all mounts on deployment-mwmaint01 is CRITICAL: CRITICAL: deployment-prep.deployment-mwmaint01.diskspace.root.byte_percentfree (<11.11%) [15:32:35] RECOVERY - Free space - all mounts on deployment-mwmaint01 is OK: OK: All targets OK [15:34:58] !log deployment-mwmaint01: /var/cache/hhvm/cli.hhbc.sq3 (1.3GB) - T161598 [15:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:35:07] T161598: Monitor HHVM bytecode cache depletion on mediawiki app servers - https://phabricator.wikimedia.org/T161598 [15:53:59] the LC cache makes lots of sense to me, It has been the biggest bottleneck in speed of CI tests and I have been trying to improve it but nothing got deployed yet [16:02:02] the chain is so messy... [16:11:01] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [16:26:55] (03PS1) 10Lucas Werkmeister (WMDE): Add dropped Wikibase dependencies as comments [integration/config] - 10https://gerrit.wikimedia.org/r/532398 (https://phabricator.wikimedia.org/T231198) [16:27:07] (03CR) 10Lucas Werkmeister (WMDE): "> Patch Set 2:" [integration/config] - 10https://gerrit.wikimedia.org/r/532379 (https://phabricator.wikimedia.org/T231198) (owner: 10Ladsgroup) [16:30:21] 10Release-Engineering-Team, 10Operations: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10greg) >>! In T229894#5398333, @Dzahn wrote: > We talked on IRC about this and agreed this ticket should be re-purposed away from "production access to puppetmaster" and to "add to... [16:30:41] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO: Requesting access to Puppet for Viztor[S] - https://phabricator.wikimedia.org/T229894 (10greg) [16:30:45] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:30:57] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO: whitelist user Viztor[S] in CI - https://phabricator.wikimedia.org/T229894 (10greg) [16:34:24] We have 24 patches just in gate-and-submit. This hopefully fixes it https://gerrit.wikimedia.org/r/c/mediawiki/core/+/532399 [16:36:46] 10Release-Engineering-Team, 10Wikidata, 10Patch-For-Review, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour in php7.2 - https://phabricator.wikimedia.org/T231198 (10Lucas_Werkmeister_WMDE) @Ladsgroup you should probably coordinate that revert with {T231183} (also UBN). [16:37:21] (03CR) 10Jforrester: [C: 03+2] Add dropped Wikibase dependencies as comments [integration/config] - 10https://gerrit.wikimedia.org/r/532398 (https://phabricator.wikimedia.org/T231198) (owner: 10Lucas Werkmeister (WMDE)) [16:37:39] 10Release-Engineering-Team, 10Wikidata, 10Patch-For-Review, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour in php7.2 - https://phabricator.wikimedia.org/T231198 (10Ladsgroup) >>! In T231198#5438250, @Lucas_Werkmeister_WMDE wrote: > @Ladsgroup you should probably coordinate tha... [16:38:28] 4 hours to merge a patch, wow [16:38:37] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/532368 [16:40:58] (03Merged) 10jenkins-bot: Add dropped Wikibase dependencies as comments [integration/config] - 10https://gerrit.wikimedia.org/r/532398 (https://phabricator.wikimedia.org/T231198) (owner: 10Lucas Werkmeister (WMDE)) [16:51:34] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.34.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T220745 (10Jdforrester-WMF) [17:04:14] 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Jenkins jobs not running after pushing to gerrit for Jpita user - https://phabricator.wikimedia.org/T231003 (10zeljkofilipin) a:03zeljkofilipin [17:07:32] 10Continuous-Integration-Config, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Jenkins jobs not running after pushing to gerrit for Jpita user - https://phabricator.wikimedia.org/T231003 (10hashar) When a new patchset is proposed in Gerri... [17:13:48] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO: Create mirror of Gerrit repositories for consumption by various tools - https://phabricator.wikimedia.org/T226240 (10hashar) (note to self: gotta verify whether those https hits are actually git requests, they might... [17:23:24] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO: Create mirror of Gerrit repositories for consumption by various tools - https://phabricator.wikimedia.org/T226240 (10thcipriani) >>! In T226240#5421656, @Paladox wrote: > @hashar phabricator has been migrated to use... [17:30:52] 10Release-Engineering-Team, 10Wikidata, 10Patch-For-Review, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour - https://phabricator.wikimedia.org/T231198 (10Krinkle) [17:31:29] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:31:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul, 10Patch-For-Review: CI performance issues - https://phabricator.wikimedia.org/T231200 (10Krinkle) [17:49:40] 10Continuous-Integration-Config, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10User-zeljkofilipin: Jenkins jobs not running after pushing to gerrit for Jpita user - https://phabricator.wikimedia.org/T231003 (10zeljkofilipin) Thanks, now I remember that I've... [17:57:57] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MobileFrontend, 10Documentation, 10Readers-Web-Backlog (Tracking): Migrate documentation generation to Node 10.15.2 from node 6.11.0 - https://phabricator.wikimedia.org/T230841 (10Jdforrester-WMF) a:03Jdforrester-WMF The replacement for the... [18:00:05] (03PS1) 10Jforrester: layout: [MinervaNeue, MobileFrontend, Popups] Add extension-javascript-documentation template [integration/config] - 10https://gerrit.wikimedia.org/r/532416 [18:00:07] (03PS1) 10Jforrester: layout: [MobileFrontend] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532417 (https://phabricator.wikimedia.org/T230841) [18:00:09] (03PS1) 10Jforrester: layout: [MinervaNeue] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532418 (https://phabricator.wikimedia.org/T230841) [18:00:11] (03PS1) 10Jforrester: layout: [Popups] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532419 [18:00:13] (03PS1) 10Jforrester: jjb: Drop mwext-npm-doc-publish [integration/config] - 10https://gerrit.wikimedia.org/r/532420 [18:01:43] 10Release-Engineering-Team, 10Wikidata, 10Patch-For-Review, 10ci-test-error: Wikibase and WikibaseLexeme tests now take more than 1 hour - https://phabricator.wikimedia.org/T231198 (10thcipriani) >>! In T231198#5437902, @Ladsgroup wrote: > The issue actually started in Friday: > - https://integration.wikim... [18:11:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [18:12:25] PROBLEM - Puppet errors on deployment-mediawiki-jhuneidi is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [3.0] [18:15:45] PROBLEM - Host deployment-mediawiki-jhuneidi is DOWN: CRITICAL - Host Unreachable (172.16.6.25) [18:16:13] longma: ^^ :-) [18:16:41] that might. be the. one i deleted [18:17:51] (03CR) 10Jforrester: [C: 03+2] layout: [MinervaNeue, MobileFrontend, Popups] Add extension-javascript-documentation template [integration/config] - 10https://gerrit.wikimedia.org/r/532416 (owner: 10Jforrester) [18:20:31] RECOVERY - Host deployment-mediawiki-jhuneidi is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [18:21:34] (03Merged) 10jenkins-bot: layout: [MinervaNeue, MobileFrontend, Popups] Add extension-javascript-documentation template [integration/config] - 10https://gerrit.wikimedia.org/r/532416 (owner: 10Jforrester) [18:23:15] !log Zuul: [MinervaNeue, MobileFrontend, Popups] Add extension-javascript-documentation template [18:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:31:44] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908), 10JavaScript: Upgrade all CI jobs from node6/npm3 to node10/npm6 across all projects - https://phabricator.wikimedia.org/T211784 (10Jdforrester-WMF) [18:31:47] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MobileFrontend, 10Documentation, and 2 others: Migrate documentation generation to Node 10.15.2 from node 6.11.0 - https://phabricator.wikimedia.org/T230841 (10Jdforrester-WMF) [18:31:56] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MobileFrontend, 10Documentation, and 2 others: Migrate documentation generation to Node 10.15.2 from node 6.11.0 - https://phabricator.wikimedia.org/T230841 (10Jdforrester-WMF) [18:35:20] (03CR) 10Jforrester: "It's harmless on its own, but it might be confusing for whoever comes along to do the rest of the set-up later." [integration/config] - 10https://gerrit.wikimedia.org/r/530686 (https://phabricator.wikimedia.org/T230646) (owner: 10MarcoAurelio) [18:42:36] (03PS1) 10Jforrester: layout: Make those experimental for now [integration/config] - 10https://gerrit.wikimedia.org/r/532428 [18:42:46] (03CR) 10Jforrester: [C: 03+2] layout: Make those experimental for now [integration/config] - 10https://gerrit.wikimedia.org/r/532428 (owner: 10Jforrester) [18:45:08] (03Merged) 10jenkins-bot: layout: Make those experimental for now [integration/config] - 10https://gerrit.wikimedia.org/r/532428 (owner: 10Jforrester) [19:32:24] (03PS1) 10Thcipriani: Move puppet jobs to dedicated small node [integration/config] - 10https://gerrit.wikimedia.org/r/532437 (https://phabricator.wikimedia.org/T231200) [19:36:29] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 64.29% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:53:07] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 5 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [19:53:27] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 5 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Krinkle) [20:03:02] (03CR) 10Jforrester: "Looks right. Is the size of the node going to be big enough for the puppet runs? I know they get relatively large?" [integration/config] - 10https://gerrit.wikimedia.org/r/532437 (https://phabricator.wikimedia.org/T231200) (owner: 10Thcipriani) [20:09:21] (03CR) 10Thcipriani: "> Looks right. Is the size of the node going to be big enough for the" [integration/config] - 10https://gerrit.wikimedia.org/r/532437 (https://phabricator.wikimedia.org/T231200) (owner: 10Thcipriani) [20:11:00] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [20:26:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908): Document how to deploy a new Quibble version to CI - https://phabricator.wikimedia.org/T231251 (10hashar) [20:26:27] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908): Document how to deploy a new Quibble version to CI - https://phabricator.wikimedia.org/T231251 (10hashar) So where should I write the doc for it ? In ./dockerfiles/README.md... [20:27:14] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201908), 10Documentation: Document how to deploy a new Quibble version to CI - https://phabricator.wikimedia.org/T231251 (10hashar) [20:36:32] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [20:41:41] (03CR) 1020after4: "New patch set coming right up..." (033 comments) [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4) [20:41:58] (03PS16) 1020after4: local-charts: CLI for managing minikube, helm, etc [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) [21:13:01] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MobileFrontend, 10Documentation, and 2 others: Migrate documentation generation to Node 10.15.2 from node 6.11.0 - https://phabricator.wikimedia.org/T230841 (10Jdforrester-WMF) @Jdlrobson: Unfortunately it's not passing in CI: > 00:02:07.087... [21:16:17] (03CR) 10Jeena Huneidi: [V: 03+1 C: 03+1] local-charts: CLI for managing minikube, helm, etc (031 comment) [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4) [21:31:17] 10Beta-Cluster-Infrastructure, 10Machine vision, 10Product-Infrastructure-Team-Backlog, 10Structured-Data-Backlog, 10SDC-Statements (Machine-vision-depicts): Message strings not resolving correctly on the Beta Cluster - https://phabricator.wikimedia.org/T231093 (10Jdforrester-WMF) Not sure what changed b... [21:51:15] looks like jenkins is busy [21:51:19] some stuff running for 4-5 hours? [21:51:24] er, 5-6 [21:52:45] but progressing, slowly, it seems [21:55:48] indeed, in this instance gate-and-submit is limited by the number of patches that run per queue, it seems, rather than executors [21:55:59] https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/zuul/layout.yaml#511 [22:01:21] Krenair: Yes. [22:04:27] (03PS2) 10Jforrester: layout: [MinervaNeue] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532418 (https://phabricator.wikimedia.org/T230841) [22:04:37] (03CR) 10Jforrester: [C: 03+2] layout: [MinervaNeue] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532418 (https://phabricator.wikimedia.org/T230841) (owner: 10Jforrester) [22:05:45] (03PS2) 10Jforrester: layout: [Popups] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532419 [22:11:04] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [22:12:07] (03Merged) 10jenkins-bot: layout: [MinervaNeue] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532418 (https://phabricator.wikimedia.org/T230841) (owner: 10Jforrester) [22:12:34] !log Zuul: Move MinervaNeue to extension-javascript-documentation [22:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:24:30] (03CR) 10Jforrester: [C: 03+2] layout: [Popups] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532419 (owner: 10Jforrester) [22:25:37] (03PS2) 10Jforrester: layout: [MobileFrontend] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532417 (https://phabricator.wikimedia.org/T230841) [22:25:58] (03Merged) 10jenkins-bot: layout: [Popups] Drop mwext-npm-doc-publish, using extension-javascript-documentation now [integration/config] - 10https://gerrit.wikimedia.org/r/532419 (owner: 10Jforrester) [22:30:25] (03PS2) 10Jforrester: jjb: Drop mwext-npm-doc-publish [integration/config] - 10https://gerrit.wikimedia.org/r/532420 [22:31:03] !log Zuul: Move Popups to extension-javascript-documentation [22:31:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:32:33] (03CR) 10jerkins-bot: [V: 04-1] jjb: Drop mwext-npm-doc-publish [integration/config] - 10https://gerrit.wikimedia.org/r/532420 (owner: 10Jforrester) [22:37:45] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<30.00%) [22:51:05] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [22:51:44] Why do I not see a submit button at https://gerrit.wikimedia.org/r/#/c/translatewiki/+/532233/ ? [22:51:53] it has CR+2 and V+2 [22:53:26] your not in the Translatewiki.net group [22:55:42] oh right, I can't even leave CR+2 there myself [23:01:01] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [23:38:58] (03CR) 10Brennen Bearnes: [C: 03+1] local-charts: CLI for managing minikube, helm, etc [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4)