[00:07:25] PROBLEM - Host deployment-cumin is DOWN: CRITICAL - Host Unreachable (172.16.5.1) [00:10:38] PROBLEM - Host deployment-cache-upload04 is DOWN: CRITICAL - Host Unreachable (172.16.5.14) [00:23:49] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10mmodell) From looking at http requests per minute in javamelody, over 1 year, I see that traffic has increased a lot: https://gerrit.wikimedia.org/r/monitoring?part=graph... [01:12:29] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Documentation: Document how to convert projects into subprojects/milestones etc - https://phabricator.wikimedia.org/T221112 (10MGChecker) [01:13:09] 10Phabricator, 10Project-Admins, 10Release-Engineering-Team (Next), 10Documentation: Document how to convert projects into subprojects/milestones etc - https://phabricator.wikimedia.org/T221112 (10MGChecker) T123078 contains some information about how this works. [01:13:38] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) >>! In T221026#5117178, @mmodell wrote: > From looking at http requests per minute in javamelody, over 1 year, I see that traffic has increased a lot: > > htt... [01:19:05] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) >>! In T221026#5116925, @thcipriani wrote: > Here's the GCEasy report from around the time gerrit started thrashing today: > https://gceasy.io/my-gc-report.jsp?p... [01:25:14] 10Phabricator, 10Operations, 10serviceops: Puppet using the phabricator class fails with: "secret(): invalid secret waf/modsecurity_admin.conf" - https://phabricator.wikimedia.org/T221182 (10Paladox) [01:32:35] 10Phabricator, 10Operations, 10serviceops: Puppet using the phabricator class fails with: "secret(): invalid secret waf/modsecurity_admin.conf" - https://phabricator.wikimedia.org/T221182 (10Paladox) 05Open→03Declined Oh, nvm, labs/private was out of date too. [05:23:16] (03CR) 10Kosta Harlan: "> Define a new codehealth pipeline, and add it to extensions which output" [integration/config] - 10https://gerrit.wikimedia.org/r/502606 (https://phabricator.wikimedia.org/T218598) (owner: 10Kosta Harlan) [09:58:45] 10Continuous-Integration-Config, 10Jade, 10Patch-For-Review, 10Scoring-platform-team (Current): Rename JADE->Jade in continuous integration - https://phabricator.wikimedia.org/T212181 (10Ladsgroup) >>! In T212181#5113562, @Harej wrote: >>>! In T212181#4998821, @Ladsgroup wrote: >> There's nothing to review... [11:04:40] (03PS2) 10Hashar: Diff declared deps in CI and in extension.json [integration/config] - 10https://gerrit.wikimedia.org/r/504437 [11:06:08] (03CR) 10jerkins-bot: [V: 04-1] Diff declared deps in CI and in extension.json [integration/config] - 10https://gerrit.wikimedia.org/r/504437 (owner: 10Hashar) [11:33:37] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP-Access-Requests, 10Patch-For-Review: Bless Brennen with Gerrit administrator rights - https://phabricator.wikimedia.org/T218858 (10hashar) Yes the account has to be added to the LDAP group `gerritadmin`. [11:41:08] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-WebDAV, 10Patch-For-Review: WebDAV master is broken on PHP7.* CI: Use of undefined constant WEBDAV_AUTH_TOKEN - https://phabricator.wikimedia.org/T220888 (10hashar) a:05hashar→03None [11:44:17] 10Release-Engineering-Team (Kanban), 10Quibble, 10Patch-For-Review: Quibble space separated options shallow arguments - https://phabricator.wikimedia.org/T218357 (10hashar) 05Open→03Resolved [11:45:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10WorkType-Maintenance: Re-evaluate use of "Dependent Pipeline" in Zuul for gate-and-submit - https://phabricator.wikimedia.org/T94322 (10hashar) a:05hashar→03None [11:48:10] (03Abandoned) 10Hashar: Added VEForAll dependency on VisualEditor [integration/zuul] - 10https://gerrit.wikimedia.org/r/433431 (owner: 10Yaron Koren) [11:48:58] (03Abandoned) 10Hashar: Update paramiko to 2.2 [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/383914 (https://phabricator.wikimedia.org/T171165) (owner: 10Paladox) [11:51:01] (03PS1) 10Hashar: 2.5.1-wmf6: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 [11:51:46] (03PS2) 10Hashar: 2.5.1-wmf7: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 [11:52:02] (03PS3) 10Hashar: 2.5.1-wmf7: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 [11:52:10] (03CR) 10Hashar: [C: 03+2] 2.5.1-wmf7: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 (owner: 10Hashar) [11:53:10] (03CR) 10Hashar: [C: 04-2] 2.5.1-wmf7: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 (owner: 10Hashar) [11:57:34] (03CR) 10Hashar: [C: 03+2] "I thought I forgot something :)" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 (owner: 10Hashar) [12:00:28] (03Merged) 10jenkins-bot: 2.5.1-wmf7: fix flaw in wmf-ignore-submit-error-onmerged-change [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/504534 (owner: 10Hashar) [12:17:40] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Zuul: Deploy Zuul 2.5.1-wmf7 - https://phabricator.wikimedia.org/T221227 (10hashar) [12:21:18] 10Continuous-Integration-Infrastructure, 10Operations: Upload Zuul 2.5.1-wmf7 package to apt.wikimedia.org - https://phabricator.wikimedia.org/T220380 (10hashar) [12:51:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Zuul: Deploy Zuul 2.5.1-wmf7 - https://phabricator.wikimedia.org/T221227 (10hashar) 05Open→03Resolved a:03hashar [12:58:46] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team (Kanban), 10Zuul, 10Patch-For-Review: Zuul cancels all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) 05Open→03Resolved Should be good now! [13:01:51] 10Deployments, 10HHVM: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448 (10hashar) 05Open→03Resolved a:03hashar That should have been kept resolved... [13:14:33] 10Continuous-Integration-Config, 10Technical-Debt: Write unit tests for set_parameters() function in zuul config - https://phabricator.wikimedia.org/T126182 (10hashar) 05Open→03Resolved a:03hashar [13:15:58] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Epic: Use Beta cluster as a true canary for code deployments (epic) - https://phabricator.wikimedia.org/T53494 (10hashar) [13:16:00] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Browser-Tests, 10WorkType-NewFunctionality: integrate browsertests with beta deployment - https://phabricator.wikimedia.org/T130047 (10hashar) 05Open→03Declined We don't have much bandwith for that. Eventually a similar feature will be a... [13:16:00] hashar: do you know if there are plans to upgrade gerrit to 2.16? The main problem right now is that coordinating code review is a major problem for teams as I understand it, and the only solution we have (dashboards) don't work in the new UI in 2.15. [13:16:47] I'm a hold out on the old UI for that reason, but I'm probably in the minority. [13:19:21] Krinkle: yeah there is a task filled about dashboard not owrking in 2.15 [13:19:24] really it is a mess [13:19:40] the new UI in 2.15 is missing a wide range of features [13:19:50] is there an ETA for 2.16? Given we have regular upgrades, I guess there's something blocking going to 2.16. [13:19:55] we had a few blockers for the upgrade to 2.16, among them Zuul requiring an update [13:20:14] I see. That sucks. I figured it'd be something like that, but didn't know it was Zuul. [13:20:15] and we had/still have operational issues with Gerrit :/ [13:20:23] Yeah [13:20:36] luckily Paladox has been quite great at identifying a wide range of blockers [13:20:49] which should be fixed now :) (i think) [13:20:55] so I guess best hope is to watch https://phabricator.wikimedia.org/T200739 ;D [13:21:32] gggg [13:21:37] anyway [13:21:54] can poke thcipriani about when to upgrade to 2.16 but most probably we will speak about it next week [13:22:03] (we meet face to face yeah next week yeah!) [13:23:39] Gerrit 3.0 will be branched by the end of this week [13:23:44] and released next month :) [13:24:38] 10Continuous-Integration-Config, 10MediaWiki-extensions-DonationInterface: jjb: run composer install in DonationInterface - https://phabricator.wikimedia.org/T131264 (10hashar) 05Open→03Resolved a:03hashar DonationInterface has once been hacked to run composer install and eventually got migrated to use t... [13:25:53] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Scap, 10Scoring-platform-team, 10User-mobrovac: Deploy beta cluster services automatically via scap3 - https://phabricator.wikimedia.org/T131857 (10hashar) 05Open→03Declined The intent was to automatize running scap for the services on... [13:28:26] 10Continuous-Integration-Config: jenkins debian-glue job should use Wikimedia debian mirror - https://phabricator.wikimedia.org/T145508 (10hashar) 05Open→03Declined We are no more running piuparts. [13:30:14] 10Release-Engineering-Team (Backlog): Composer can't regenerate class map at operations/mediawiki-config - https://phabricator.wikimedia.org/T155151 (10hashar) [13:32:12] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog), 10Composer: Composer failed in Selenium job but job didn't stop - https://phabricator.wikimedia.org/T177047 (10hashar) 05Open→03Resolved a:03hashar That is from two years ago. We have upgraded composer meanwhile and refactored the j... [13:33:25] 10Continuous-Integration-Config: Jenkins shouldn't run jobs for commits pushed with submit - https://phabricator.wikimedia.org/T180544 (10hashar) 05Open→03Declined [13:38:14] 10Continuous-Integration-Config, 10ORES, 10Scoring-platform-team: Daily build integration test to prove that ORES makefiles are sane - https://phabricator.wikimedia.org/T192606 (10hashar) 05Open→03Declined I don't think this task has a champion anymore. [14:26:14] 10Continuous-Integration-Infrastructure: runtime/cgo: pthread_create failed: Resource temporarily unavailable - https://phabricator.wikimedia.org/T206215 (10hashar) 05Open→03Resolved [14:27:42] 10Continuous-Integration-Infrastructure: common gating job for mediawiki core and extensions - https://phabricator.wikimedia.org/T60772 (10hashar) 05Open→03Resolved a:03hashar I don't think we need this task anymore. That is nowadays the wmf-quibble jobs. [14:28:31] 10Continuous-Integration-Infrastructure, 10Zuul: Zuul: Highlight relevant change on Zuul status page when following submit pipeline url - https://phabricator.wikimedia.org/T65399 (10hashar) [14:29:20] 10Continuous-Integration-Config, 10Phabricator, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Create a continuous integration plan for Wikimedia Phabricator patches - https://phabricator.wikimedia.org/T85123 (10hashar) [14:40:24] 10MediaWiki-Codesniffer: RfC: self vs for class self-references - https://phabricator.wikimedia.org/T221236 (10D3r1ck01) [14:41:18] does anyone know if adding a new extension to https://www.mediawiki.org/wiki/Git/Reviewers#Listen_to_specific_repositories 'magically works'? because I need to add ActiveAbstracts [14:41:28] finding out about a breaking change by luck is not so good.... [14:42:18] 10MediaWiki-Codesniffer: RfC: self vs for class self-references - https://phabricator.wikimedia.org/T221236 (10D3r1ck01) [14:45:22] hm it looks like people are just adding the repo section header and their stuff so I'm gonna try it, bot code looks like it too [14:51:42] apergos: It work [14:51:50] So you add the repo and then add yourself as reviewer (default) [14:52:01] Reviewer bot will add you on all changes by default [14:52:05] *works [14:53:00] yes that's what it looks like [14:53:09] https://www.mediawiki.org/wiki/Git/Reviewers#mediawiki/extensions/AbuseFilter [14:53:19] apergos: Yeah, I've used it and it has not failed me so far :) [15:08:11] (03PS1) 10Thcipriani: Edit Project Config [blubber] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/504579 [15:08:46] (03Abandoned) 10Thcipriani: Edit Project Config [blubber] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/504579 (owner: 10Thcipriani) [15:11:17] thcipriani: I'm in another meeting atm but wanted to ask if you would mind syncing up with me on https://phabricator.wikimedia.org/T221087 later today for a few minutes, I'll hit you up when I have like 10m. I'm nots sure where to start [15:11:53] chasemp: definitely, thanks for looking at that I appreciate it :) [15:20:08] (03PS1) 10Lucas Werkmeister (WMDE): Add Pintoch to CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/504581 [15:24:17] (03PS1) 10Lucas Werkmeister (WMDE): Remove HPI students from CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/504583 [16:02:17] 10Continuous-Integration-Infrastructure, 10Jenkins: Jenkins plugins security update 2019-04-17 - https://phabricator.wikimedia.org/T221256 (10MoritzMuehlenhoff) [16:05:19] is it possible to run the gate-and-submit jobs on a change without +2ing it? [16:05:33] I’m investigating a CI failure that looks like it might only happen on gate-and-submit, not regular test [16:05:49] (otherwise I guess I can +2 the change to start the builds and then remove the +2 again…) [16:06:20] 10Continuous-Integration-Infrastructure, 10Jenkins: Jenkins plugins security update 2019-04-17 - https://phabricator.wikimedia.org/T221256 (10hashar) 05Open→03Resolved Refreshed and checked both: https://releases-jenkins.wikimedia.org/pluginManager/ https://integration.wikimedia.org/ci/pluginManager/ We... [16:09:17] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T220726 (10Krenair) [16:10:27] twentyafterfour, hey, sorry about the timing, just added a deployment blocker ^ [16:11:34] I can't log in on testwiki or mediawikiwiki, I can log in on enwiki [16:11:54] assumptions... that this is indeed a problem introduced in 1.34.0-wmf.1 [16:11:58] and that this affects all 2FA users [16:15:41] hi! anyone else getting this error with vagrant up on a new box? https://tools.wmflabs.org/paste/view/c95a8f2d [16:33:15] Is there someone Gerrit specialist? https://imgur.com/BdFe9TH [16:33:23] Please help! [16:34:59] Tulsi: wild guess – perhaps related to https://lists.wikimedia.org/pipermail/cloud-announce/2019-April/000161.html ? [16:35:22] oh nevermind I didn’t see the reply in -tech [16:35:29] 10Gerrit, 10LDAP: Cannot log into Gerrit: "Cannot assign user name to account; name already in use." - https://phabricator.wikimedia.org/T220867 (10Krenair) @Tulsi_Bhagat also has this: https://imgur.com/BdFe9TH [16:37:05] 10Gerrit, 10LDAP: Cannot log into Gerrit: "Cannot assign user name to account; name already in use." - https://phabricator.wikimedia.org/T220867 (10Aldnonymous) Same problem to me Aldnonymous / Aldnonymous4@gmail.com can't log in with return msg "name already in use." [16:38:07] No problem, Lucas_WMDE! Thank you for looking into it. :) [16:43:11] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10hashar) Some past behavior we have observed is a reentrant lock being held on the account cache. Potentially exacerbated when lot of concurrent HTTP requests are made. @mm... [16:52:37] Krenair: thanks for the heads up [17:06:36] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10hashar) Out of 623k https requests in today access logs: | Requests | IP | DNS PTR |--|--|-- | 84110 | 172.16.1.221 | codesearch4.codesearch.eqiad.wmflabs. | 69921 | 2620:0... [17:09:27] twentyafterfour, by the looks of #mediawiki-core they're deploying a revert of the patch that broke stuff so it should be fine to proceed [17:09:54] cool thanks Krenair [17:19:03] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP-Access-Requests, 10Patch-For-Review: Bless Brennen with Gerrit administrator rights - https://phabricator.wikimedia.org/T218858 (10Dzahn) 05Open→03Resolved a:03Dzahn >>! In T218858#5118314, @hashar wrote: > Yes the account has to be added to the L... [17:21:05] 10Gerrit, 10Security: Gerrit DoS - https://phabricator.wikimedia.org/T182756 (10hashar) 05Resolved→03Open + @thcipriani + @brennen Chad patch to raise the thread pool does not address the DoS vector :/ We currently have: ` [sshd] idleTimeout = 43200 s # 12 hours maxConnectionsPerUser = 32 thread... [17:24:07] hashar, you may wish to reassign the task too? [17:25:44] oh man [17:25:46] its public :/ [17:26:10] 10Gerrit, 10Security: Gerrit DoS - https://phabricator.wikimedia.org/T182756 (10hashar) p:05Unbreak!→03Normal [17:27:25] Krenair: thank you :) [17:27:31] and its dinner time! [17:28:55] fwiw I can still see that bug even though I don’t think I should be able to… [17:29:12] even after greg-g’s comment [17:29:37] Lucas_WMDE, you're supposed to be able to [17:29:43] you're on the CC list [17:29:52] what he said ^ [17:29:55] huh [17:29:58] no idea why [17:30:00] but okay then ^^ [17:30:08] you added yourself in January 2018 [17:30:17] Lucas_Werkmeister_WMDE added a subscriber: Lucas_Werkmeister_WMDE.Jan 24 2018, 10:47 [17:30:19] looks like it, yeah [17:30:23] could remove yourself or have someone else remove you if you really wanted [17:30:37] nah, if the visibility works as expected that’s fine [17:30:52] I just had no memory of the task so I thought something was wrong [17:37:14] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:47:34] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:02:17] Project beta-code-update-eqiad build #243112: 04FAILURE in 1 min 19 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/243112/ [18:04:22] Project beta-code-update-eqiad build #243113: 04STILL FAILING in 1 min 21 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/243113/ [18:05:10] (03PS5) 10Dduvall: pipeline: Directed graph execution model [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502917 (https://phabricator.wikimedia.org/T210267) [18:05:12] (03PS6) 10Dduvall: pipeline: Builder and stage implementation [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502918 (https://phabricator.wikimedia.org/T210267) [18:05:14] (03PS6) 10Dduvall: pipeline: Provide a rickety but useful system test [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502919 [18:05:45] (03CR) 10Dduvall: pipeline: Directed graph execution model (031 comment) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502917 (https://phabricator.wikimedia.org/T210267) (owner: 10Dduvall) [18:05:52] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502917 (https://phabricator.wikimedia.org/T210267) (owner: 10Dduvall) [18:06:08] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [18:06:28] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502919 (owner: 10Dduvall) [18:06:50] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/502918 (https://phabricator.wikimedia.org/T210267) (owner: 10Dduvall) [18:14:19] Yippee, build fixed! [18:14:19] Project beta-code-update-eqiad build #243114: 09FIXED in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/243114/ [18:16:26] if anyone sees strange DNS recursor behaviour in deployment-prep please poke me or Andrew [18:16:26] we're trying out the new cloud vps DNS recursor hosts [18:24:09] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Aklapper) @EvanProdromou : Done [18:37:54] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:46:12] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T220726 (10mmodell) [18:46:53] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T220728 (10mmodell) [18:53:54] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10mmodell) @hashar: yeah I think a read-only mirror might make sense, we could have code search and several other things reading from that instead of the master. I'd like to... [18:54:12] twentyafterfour: Yay for kicking the 2FA can down the road for two weeks. ;-( [18:55:42] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10mmodell) according to https://www.mediawiki.org/wiki/Codesearch : >codesearch uses hound as the search implementation. It indexes the origin/master branch of all specified... [18:56:03] James_F: yeah :-/ [18:56:09] I didn't want it to be forgotten though [18:56:28] and since the revert wasn't merged in master then adding it as a future train blocker seemed the most appropriate thing to do [18:56:37] No, indeed. I'll work with R.eedy to make sure it's fixed today. [18:58:50] (03PS1) 10Dduvall: experimental: Support LLB output format [blubber] - 10https://gerrit.wikimedia.org/r/504651 [19:00:20] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [blubber] - 10https://gerrit.wikimedia.org/r/504651 (owner: 10Dduvall) [19:13:51] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T220726 (10mmodell) [19:24:59] 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10serviceops, 10Core Platform Team Backlog (Later), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) Is this related to T218609 and T200832? [19:28:05] 10Beta-Cluster-Infrastructure: Puppet errors on deployment-maps04 due to node.js package problems - https://phabricator.wikimedia.org/T221277 (10Krenair) [19:38:50] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Core Platform Team (Extension Management (TEC13)), 10Core Platform Team Kanban (Contractor - Doing), 10Patch-For-Review: Determine a standard way of installing MediaWiki lib/extension dependencies within ... - https://phabricator.wikimedia.org/T193824 [19:43:30] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) >>! In T221026#5119712, @hashar wrote: > | Requests | IP | DNS PTR > |--|--|-- > | 69921 | 2620:0:861:102:10:64:16:8 | phab1001.eqiad.wmnet. I realized that pha... [19:59:31] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Krenair) >>! In T218609#5034116, @Andrew wrote: > Jessie creation is now disabled in most projects (including deployment-prep). I'd prefer to leave it that way in or... [20:14:06] 10Beta-Cluster-Infrastructure: deployment-snapshot01 puppet error due to nginx-apache2 conflict - https://phabricator.wikimedia.org/T221285 (10Krenair) [20:17:52] 10Beta-Cluster-Infrastructure, 10Discovery-Search, 10Beta-Cluster-reproducible, 10Patch-For-Review, 10Puppet: Elasticsearch puppet config changes broke puppet in various instances - https://phabricator.wikimedia.org/T205672 (10Krenair) 05Open→03Resolved I'm going to go ahead and assume tools-elastic*... [20:21:13] (03PS1) 10Hashar: Add WikibaseCirrusSearch to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/504764 (https://phabricator.wikimedia.org/T204153) [20:21:46] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Puppet failures on deployment-deploy01.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T216164 (10Krenair) {T221285} is also about issues involving the services proxy's use of nginx [20:24:22] 10Continuous-Integration-Config, 10Patch-For-Review: Wikibase CI: wmf-quibble-vendor-mysql-hhvm-docker job should include Scribunto - https://phabricator.wikimedia.org/T200976 (10hashar) [20:24:29] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 10Patch-For-Review, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050 (10hashar) [20:29:19] 10Beta-Cluster-Infrastructure: deployment-snapshot01 puppet error due to nginx-apache2 conflict - https://phabricator.wikimedia.org/T221285 (10ArielGlenn) I thought the proxy services thing doesn't get applied in beta; there's likely a missing hiera setting someplace. [20:41:43] 10Gerrit, 10Upstream: Allow searching for 'state:active', 'state:read_only', 'state:hidden' via web interface - https://phabricator.wikimedia.org/T180297 (10Paladox) [20:41:50] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Upgrade to Gerrit 2.16.7 - https://phabricator.wikimedia.org/T200739 (10Paladox) [20:42:58] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 10Patch-For-Review, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050 (10hashar) Since this is causing outages, I guess it is really t... [20:43:09] 10Gerrit, 10Upstream: Allow searching for 'state:active', 'state:read_only', 'state:hidden' via web interface - https://phabricator.wikimedia.org/T180297 (10Paladox) PolyGerrit in 2.16 by default does 'state:active' and 'state:read_only'. But you can use the rest api to properly do this. (if you want to check... [20:53:14] 10Beta-Cluster-Infrastructure: deployment-snapshot01 puppet error due to nginx-apache2 conflict - https://phabricator.wikimedia.org/T221285 (10Krenair) I think T216164#4963388 explains why it still tries to install nginx despite the ensure: absent. It should be enough to just delete the default site. [20:56:47] 10Beta-Cluster-Infrastructure: deployment-snapshot01 puppet error due to nginx-apache2 conflict - https://phabricator.wikimedia.org/T221285 (10Krenair) 05Open→03Resolved a:03Krenair `krenair@deployment-snapshot01:~$ sudo rm /etc/nginx/sites-available/default krenair@deployment-snapshot01:~$ sudo puppet age... [20:59:23] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T220728 (10Jdforrester-WMF) [21:00:31] 10Beta-Cluster-Infrastructure: deployment-snapshot01 puppet error due to nginx-apache2 conflict - https://phabricator.wikimedia.org/T221285 (10ArielGlenn) Ah wonderful, thanks a lot! [21:16:15] bd808: "Hashar was the awesome human who came up with the initial implementation of mediawiki/vendor.git" thank you :] [21:16:23] bd808: though vendor.git is really hmm ... cursing me to this day! [21:16:28] (03CR) 10Jforrester: [C: 03+1] Add WikibaseCirrusSearch to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/504764 (https://phabricator.wikimedia.org/T204153) (owner: 10Hashar) [21:30:41] 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10serviceops, 10Core Platform Team Backlog (Later), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krenair) It's not just going to become a problem once T198901 is done, it's already a problem -... [21:36:53] 10Beta-Cluster-Infrastructure, 10Mathoid, 10Operations, 10Core Platform Team Backlog (Watching / External), and 2 others: remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10Krenair) >>! In T200832#5061718, @akosiaris wrote: >>>! In T200832#5051312, @Krenair wrote: >> deployment-mathoid... [22:02:20] 10Phabricator, 10Performance: /maniphest/report/project/ : Maximum execution time of 10 seconds exceeded - https://phabricator.wikimedia.org/T125357 (10Dzahn) [22:02:23] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Operations, 10serviceops, and 3 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 (10Dzahn) [22:02:33] 10Phabricator, 10Release-Engineering-Team (Watching / External), 10Operations, 10serviceops, 10Patch-For-Review: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Dzahn) 05Open→03Stalled blocked on T215335 [22:11:19] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Operations: contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Dzahn) a:03RobH just assigning for the question in the 2 comments above [22:13:39] hashar: any tech decision made 3+ years ago is almost certain to be a pain today :) [22:14:32] bd808: I do not disagree :] Thought a lot of old decision turned out to be correct/safe and are still sane today! [22:15:18] I wish we could phase out vendor.git, but short of dramatically improving composer/packagist locking with signatures of some sort, there is no good hope :/ [22:15:27] anyway, thank you for the kind words! [22:16:10] It's not like upstream are particularly great to deal with sometimes either [22:18:46] One might think that after the great npm debacle the composer folks would pay more attention, but *shrug* [22:19:13] package mangers are hard, that's what I know [22:22:03] But some are definitely better than others [22:23:14] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) I upgraded Gerrit to 2.15.12 in preparation for plugins still in development. I'd like to not change too many things at once, but I am a bit stuck Prior to the... [22:37:33] (03CR) 10Krinkle: "Can we measure how much time it adds to typical gate jobs? Over the past year or so, we've gone from 12 minutes max for the gate pipeline," [integration/config] - 10https://gerrit.wikimedia.org/r/504764 (https://phabricator.wikimedia.org/T204153) (owner: 10Hashar) [22:39:05] Being able to merge no more than 2.5 patch an hour just isn't good enough. [22:42:22] Krinkle: Random CI flakes take a lot of time with gate stack resets. Not sure if that takes more than the long gate time generally, but agreed. If MW was properly isolatable we could "just" run the integration tests in series, and each of the sets of unit tests in parallel, but it isn't and so we can't. [22:42:40] The merge for core just now took 24 minutes without restarts or flaky [22:42:49] of which one job took 22 minutes [22:43:04] so it's not CI load being drained elsewhere either (where they don't start at once) [22:43:16] Was that one job the HHVM quibble vendor one? [22:43:24] Yeah, that's where most things happen. [22:43:35] It's also the slowest because it's HHVM. [22:43:36] The other flavours are already optimised by disabling things that don't need to be on all flavours. [22:43:47] no, I thought so too, but it's not that big a difference. [22:43:48] Dropping HHVM and php70 would speed things up a lot too. [22:43:50] Oh, hmm. [22:44:00] What gets disabled? [22:44:24] there's three php70 jobs. two of which are not about php and can be swapped for php72 variants. there's 1 php70 job for mysql that indeed needs to be dropped. [22:44:44] I've removed a few like that already, but haven't been able to remove this one yet with all the regexes and stuff being so hard to make it work. [22:44:48] But we can't drop it until you in TC confirm that we can drop PHP70 and PHP71 support from MW. :-) [22:45:07] for swat these can be droped, no problem, we've done several already. [22:45:15] just one left that doesn't need to be there. [22:45:56] I think wdio tests dont run in all flavours, and npm-test as well. [22:46:13] but might have changed again since. [22:46:26] composertest was moved to its own job instead. [22:46:51] phpunit tests for CIrussSearch take 2.5 minutes [22:47:01] low hanging fruit I hope [22:47:18] that's more than our parser tests take nowadays (thx to Tim and others) [22:48:17] Yeah, hashar suggested tagging tests as "integration" that are the only ones we run for gate, but I fear that a lot of extensions don't have the isolation to be able to say "these tests won't break unless we change something inside the repo". [22:51:35] You mean not running other tests at all on +2? [22:52:00] A good unit test is cheap and quick. the others we tag with "integration" - too rosey? [22:59:17] Krinkle: James_F: yeah the idea is that for some extension (eg Scribunto) we know those tests are self contained [22:59:29] and in no way would be affected by chagnes to other extensions [22:59:55] if i send a patch to Wikibase, surely the LUA interperter would still figure out that 2+2=4 [22:59:58] or something like that [23:00:31] I commented about it on the task to add Scribunto to extension-gate [23:02:14] I also thought about markiing some extension tests that could run without any other extension dependencies [23:02:33] but yeah that is not magic :/ [23:02:51] I should brain dump those problems somewhere really [23:03:39] also a suggestion I received is to decouple the git clone / composer/npm install in a standalone job that would be run first [23:03:42] +1 [23:03:45] then reuse the result on sub jobs [23:04:21] Oh, you're not taking about skipping "unit tests" in favour of integration tests. But rather about not running tests in the shared extension gate for other extension repos for which the tests and code are isolated. [23:04:24] but our (jenkins, zuul v2.5) stack does not make that any easy. We would have to push those files to some shared storage of some sort [23:04:35] Krinkle: yes!! [23:04:51] (03CR) 10Jforrester: [C: 03+1] "In this particular case, these tests used to be in gate (inside the Wikibase extension), and dropped out of gate when the code was moved t" [integration/config] - 10https://gerrit.wikimedia.org/r/504764 (https://phabricator.wikimedia.org/T204153) (owner: 10Hashar) [23:04:55] Is that concept realistic in MediaWiki? [23:05:08] Wikibase hook can override LuaSandboxPlusOperator and make it do minus. [23:05:10] I don't know :/ [23:05:45] With proper dependency injection, we can probably make unit tests blind to that. [23:06:04] So if the unit test for LuaSandbox can be made standalone, that's useful and then we can skip that unit tests in the gate. [23:06:16] Because it would not fail even if LuaSandboxPlusOperator hook breaks it. [23:06:21] So I'm on-board with that. [23:06:34] But we'd probably still have an integration test for all extensions in the gate that we dont skip. [23:07:00] then I don't know anything about the Scribunto extension tests :/ [23:07:49] hashar: Le toute, c'est merde. [23:07:54] ahah [23:09:46] CirrusSearch is another example [23:10:46] it probably has a bunch of tests that we do not need to run when sending a patch to one of its reverse dependency (such as GeoData or Wikibase) [23:11:15] and in extension-gate, most probably when sending a patch to AbuseFilter, we probably don't need to run most of CirrusSearch or Wikibase tests [23:11:19] but yeah hmm hard topic :/ [23:11:56] sometime I have a feeling each test should explicitly express on which other extension it depends [23:12:07] and have the testrunner to dynamically load/unload extension before each testsuite [23:12:24] so that if you don't declare the dependency at the test level, it would fail [23:12:49] and by definining the requirements for each test or testsuite, we can then known which one to run for a change to a given extension [23:13:10] that is where my brain eventually short circuit :-\\\ [23:14:10] anyway. Time to fall asleep