[00:07:20] https://gerrit.wikimedia.org/r/#/c/421449/ - reviewers rolled back [00:08:59] :) [00:09:01] thanks [00:26:43] chad@Chads-MBP ~/gerrit-workspace/gerrit (wmf/stable-2.14) $ git diff d84c5fd3d3a340ab456e337eb497818d6e312fb8..HEAD --name-only [00:26:43] gerrit-server/src/main/java/com/google/gerrit/server/change/ChangeKindCacheImpl.java [00:26:43] plugins/delete-project [00:26:43] plugins/its-base [00:26:43] plugins/lfs [00:26:44] plugins/reviewers [00:26:44] plugins/webhooks [00:26:54] paladox: That'll be our diff ^ :) [00:30:25] :) [00:31:15] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:10:40] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<33.33%) [05:59:02] no_justification: Hm.. not entirely clear to me what https://github.com/wikimedia/puppet/commit/bcd8678f9c41ed3b413dd2102e9643986f123fda was supposed to do. [05:59:22] Does it relate to actual deletion, or does it relate to the read-only/archiving thing we do currently and making those hidden by default? [05:59:39] "preserve" is not a term I've seen elsewhere. [06:05:14] Driving right now, I'll respond shortly [06:05:21] * Krinkle is worried [06:31:35] Krinkle: OK I'm home now. Soooo, the idea is that when we do a project deletion, we can preserve the git repo while still cleaning up old changes and such. [06:31:59] My motivation was to enable us to delete things more often [06:32:10] Our repo lists are gigantic and ugly [06:32:34] But I'm not 100% sold on my own idea [06:32:40] I'd like to see it in practice [06:32:45] no_justification: Ah so this isn't about read-only archiving, this is about actual deletion, which would... delete stuff from gerrit database like patch sets, and comments, and ACL, but preserve actual repo over gitiles/git protocol? [06:33:11] Well, they wouldn't be visible in Gitiles or over git commands either [06:33:25] So it'd be like an archive in case we wanted the code later [06:33:41] Restoration of it would be manual, I believe [06:33:45] Right [06:33:54] So that's what the plugin provides [06:33:56] now 'hideProjectOnPreserve' [06:34:04] What is that. [06:35:02] So there's a --preserve option to the delete command. Right now it just keeps you from deleting the actual git reoo [06:35:42] The config changes the behavior so in addition to preserving it, it reparents the repo to one with a restrictive ACL [06:36:04] You could accomplish this manually as well, ofc [06:36:40] But...the more I explain this the less convinced I am [06:36:46] Hence, reverted for now [06:36:58] We barely delete as it is. [06:37:25] Idk. I just wanna come up with some way to better manage our pile of junk code nobody cares about [06:38:49] Aye, I'd support a way to hide them from project lists. I'd also support stripping of ACL overhead and leftovers, and even automation of all this. [06:39:24] I would however like if they were still visible on Gerrit as well as permalink etc. [06:39:57] I don't think it would stand in the way for anything, and seems enabling/empowering not to require intervention to unhide or anything. [06:40:26] As long as its not in common "show everything" things and not in code search, I'm happy :) [06:42:18] Granted, marking as hidden doesn't solve my OCD... Admins still see them! Haha [06:43:18] I guess with reparenting we could set an exclusive Read bit that denies for admins too... But who knows what overrides for read there are in Gerrit itself.... [06:45:19] Part of our problem here is deletion isn't a first class citizen in Gerrit [06:45:29] Its a plugin [06:49:13] Anyway it's bedtime. [07:00:40] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:36:26] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<10.00%) [08:45:07] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:20:03] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [09:59:46] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Privacy, 10User-MarcoAurelio: Disable the collection of private information on abusefilter log for Beta Cluster wikis - https://phabricator.wikimedia.org/T188862#4022014 (10MarcoAurelio) The patch that implemented this got reverted per my message above.... [10:35:52] 10Beta-Cluster-Infrastructure, 10Privacy, 10User-MarcoAurelio: Disable the collection of private information on abusefilter log for Beta Cluster wikis - https://phabricator.wikimedia.org/T188862#4075275 (10MarcoAurelio) [10:41:45] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q4, 10Discovery, 10Wikimedia-Portals: Migrate Jenkins job wikimedia-portals-build to Docker and to use an entry point (eg: npm builddeploy) - https://phabricator.wikimedia.org/T190073#4075291 (10hashar... [10:42:39] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3: Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4075292 (10hashar) [10:42:53] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3: Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#3855599 (10hashar) [10:43:13] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3: Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#3855599 (10hashar) [10:43:15] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), 10Patch-For-Review, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load exte... - https://phabricator.wikimedia.org/T189567#4075296 [10:43:26] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#3855599 (10hashar) [10:44:10] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Maintenance-scripts: MediaWiki PHP based built-in server does not output log requests for index.php queries - https://phabricator.wikimedia.org/T190503#4075279 (10hashar) + shipyard, I would like to use the PHP built-in server for the CI Qunit job... [10:50:10] Urbanecm: in case you did not see this, for future reference https://phabricator.wikimedia.org/T187761#4072672 [10:50:15] (03Abandoned) 10Phedenskog: Run WebPageTest tests from Singapore to verify the new cache pop. [integration/config] - 10https://gerrit.wikimedia.org/r/421266 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [10:50:31] (you are the one creating a lot of patches similar to that one) [10:54:06] PROBLEM - Host deployment-videoscaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.130) [10:54:53] PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211) [11:30:52] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4075404 (10thiemowmde) [11:54:37] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4075441 (10thiemowmde) I tried a whole bunch of things in https://gerrit.wikimedia.org/r/421264 as well as https://gerrit.wik... [13:42:34] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Update page object pattern in Selenium tests - https://phabricator.wikimedia.org/T185094#4075709 (10zeljkofilipin) From [[ https://gerrit.wikimedia.org/r/#/c/412956/ | 412956 ]]: > @Krinkle > Mar 22 5:07... [13:46:25] RECOVERY - Free space - all mounts on integration-slave-jessie-1001 is OK: OK: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found) [13:52:22] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Update page object pattern in Selenium tests - https://phabricator.wikimedia.org/T185094#4075761 (10zeljkofilipin) >> @Krinkle >> I'm aware the Babel option is one of the upstream-documented ways to use W... [13:53:03] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Update page object pattern in Selenium tests - https://phabricator.wikimedia.org/T185094#4075762 (10zeljkofilipin) [13:54:52] 10Release-Engineering-Team (Next), 10Wikimedia-Logstash, 10Wikimedia-log-errors: Fatalmonitor on logstash still includes deprecated channel:wfLogDBError - https://phabricator.wikimedia.org/T165675#4075779 (10jcrespo) p:05Triage>03High This potentially make sql injections being ignored. [14:09:04] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Retrospective for T139740 Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T188740#4075825 (10zeljkofilipin) [14:14:48] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Retrospective for T139740 Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T188740#4075871 (10zeljkofilipin) Wiki (needs formatting): https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/20180320_Selenium_Retros... [14:15:23] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Retrospective for T139740 Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T188740#4075876 (10zeljkofilipin) [14:16:48] (03PS2) 10Umherirrender: Use composer unit tests for BlueSpiceAbout [integration/config] - 10https://gerrit.wikimedia.org/r/417970 [14:53:16] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [15:17:58] (03PS1) 10Hashar: It is a mess but runs qunit locally [integration/quibble] - 10https://gerrit.wikimedia.org/r/421548 [15:18:19] thcipriani: qunit in quibble ^ though that is a messsy change [15:18:24] (03CR) 10jerkins-bot: [V: 04-1] It is a mess but runs qunit locally [integration/quibble] - 10https://gerrit.wikimedia.org/r/421548 (owner: 10Hashar) [15:18:44] and i found a nice way to prefix various outputs [15:21:34] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Retrospective for T139740 Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T188740#4076030 (10zeljkofilipin) [15:21:54] stderr_relayer eh... [15:23:39] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4076032 (10zeljkofilipin) a:05zeljkofilipin>03None @thiemowmde and @Tonina_Zhelyazkova_WMDE are working on this. [15:29:17] 10Release-Engineering-Team (Watching / External), 10Operations, 10Parsoid: Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#4076058 (10ssastry) [15:30:37] 10Release-Engineering-Team (Watching / External), 10Operations, 10Parsoid: Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#4076061 (10Dzahn) a:03Dzahn [15:47:28] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4076130 (10zeljkofilipin) >>! In T190307#4072249, @Tonina_Zhelyazkova_WMDE wrote: > I personally think there's nothing incorr... [15:49:55] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4076133 (10zeljkofilipin) >>! In T190307#4072249, @Tonina_Zhelyazkova_WMDE wrote: > The failures in [[ https://gerrit.wikime... [15:55:25] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find a few people interested in reviewing Selenium patches - https://phabricator.wikimedia.org/T188744#4076153 (10zeljkofilipin) [15:56:07] 10Release-Engineering-Team (Kanban), 10Advanced-Search, 10TCB-Team, 10Patch-For-Review, 10User-zeljkofilipin: Cannot find module nodemw - https://phabricator.wikimedia.org/T190307#4076160 (10zeljkofilipin) >>! In T190307#4075441, @thiemowmde wrote: > * The fact that core can remove a dependency that is n... [16:10:58] Hmm I get this [16:11:08] 16:10 #wikimedia-codereview Cannot join channel (+i) - you must be invited [16:11:54] Yeah. We killed that channel. [16:12:08] Channel was redirected to #wikimedia-tech [16:13:40] Oh [16:32:26] (03PS8) 10Hashar: Add support for sqlite/mysql databases [integration/quibble] - 10https://gerrit.wikimedia.org/r/419787 (https://phabricator.wikimedia.org/T166145) [16:37:39] (03CR) 10Hashar: [C: 032] "That is good enough for now" [integration/quibble] - 10https://gerrit.wikimedia.org/r/419787 (https://phabricator.wikimedia.org/T166145) (owner: 10Hashar) [16:38:02] (03Merged) 10jenkins-bot: Add support for sqlite/mysql databases [integration/quibble] - 10https://gerrit.wikimedia.org/r/419787 (https://phabricator.wikimedia.org/T166145) (owner: 10Hashar) [16:45:00] aha, the rebase bug is pg lol [16:50:52] PROBLEM - Free space - all mounts on deployment-ores01 is CRITICAL: CRITICAL: deployment-prep.deployment-ores01.diskspace._srv.byte_percentfree (No valid datapoints found)deployment-prep.deployment-ores01.diskspace.root.byte_percentfree (<100.00%) [16:53:06] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T183965#4076319 (10demon) 05Open>03Resolved [17:03:11] no_justification i think we should run a manual gc on the puppet repo, it takes a while to publish changes on there. (please) :) [17:03:42] Doing [17:03:50] Also, we should re-enable auto-gc intervals again [17:03:58] I disabled it way back when there was this jgit bug [17:03:58] thanks [17:04:01] yeh [17:04:01] But it's long since fixed [17:04:05] no_justification i have a task for this :) [17:04:09] i've asked upstream [17:04:17] who said they run it all the time. [17:04:30] Yeah [17:04:34] T190045 [17:04:34] T190045: Experiment switching gc back on in gerrit - https://phabricator.wikimedia.org/T190045 [17:10:05] I think the gc bug may have been fixed [17:10:17] i remeber seeing diffs for changes to gc [17:10:27] a while ago [17:20:06] no_justification they are planning 2.15 for stable release next week :) https://groups.google.com/forum/#!topic/repo-discuss/ie8luafzzog [17:29:50] It slightly amuses me that the patch to move MW requirements to PHP7+ ( https://gerrit.wikimedia.org/r/#/c/405216/ ) fails CI not because of phpunit, which is fine, but the two qunit jobs, whose mw-install-mysql.sh scripts still run on PHP5 not 7. [17:32:52] gah [17:35:31] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10PHP 7.0 support: Make Wikimedia CI run PHP in either PHP 7.0+ or HHVM - https://phabricator.wikimedia.org/T190547#4076530 (10Jdforrester-WMF) p:05Triage>03Normal [17:35:54] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10PHP 7.0 support: Move mediawiki-core-qunit-selenium-jessie and mediawiki-extensions-qunit-jessie to run mw-install-mysql.sh on PHP7/HHVM - https://phabricator.wikimedia.org/T190548#4076541 (10Jdforrester-WMF) p:05Triage>03Normal [17:37:06] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10PHP 7.0 support: Make Wikimedia CI run PHP in either PHP 7.0+ or HHVM - https://phabricator.wikimedia.org/T190547#4076554 (10Jdforrester-WMF) [17:37:08] 10Continuous-Integration-Config, 10Release-Engineering-Team (Someday), 10Test-Coverage: Switch MediaWiki coverage job from PHP 5 to PHP 7 - https://phabricator.wikimedia.org/T147778#4076555 (10Jdforrester-WMF) [17:37:48] (03PS2) 10Hashar: Add webhost and qunit support [integration/quibble] - 10https://gerrit.wikimedia.org/r/421548 [17:38:50] 10Continuous-Integration-Infrastructure (shipyard), 10Quibble, 10Patch-For-Review: Start mysql in quibble container - https://phabricator.wikimedia.org/T166145#4076556 (10hashar) 05Open>03Resolved a:03hashar [17:39:10] twentyafterfour no_justification we found a better way to add our customisations for phab's php ini :) [17:39:14] using conf.d dir [17:40:01] /etc/php/7.2/apache2/conf.d/php.ini (with only custom wmf changes) [17:40:04] so not the full one [17:40:45] ^ like and suggested [17:40:56] instead of adding almost 2000 lines of php.ini which is mostly default [17:41:04] just adding what we actually customize.. yep [17:44:23] That ^ [17:44:25] +100000 [17:47:17] apergos: How goes switching the dumps to HHVM/PHP7 boxen? [17:47:56] https://phabricator.wikimedia.org/T184258#4075189 [17:48:21] steps are: full testing in beta, then deploy on production testbed, then deploy on each prod host in turn [17:48:50] Right. Do you feel we'll get it done before the end of April? [17:48:52] I have it working but only with a locally patched php_memcached module. the right fix is waiting for a build which is likely to happen in [17:48:58] a couple of weeks. [17:48:58] no_justification :) [17:49:03] Cool. [17:49:06] https://gerrit.wikimedia.org/r/#/c/410245/29/modules/phabricator/templates/php72.ini.erb [17:49:08] I sure hope so [17:49:38] Beyond dumps, there's CI not working yet, and the mwscript stuff no_justification's been battling with. [17:49:46] So we're "close". [17:51:17] For certain values of close, sure [17:51:46] * James_F grins. [17:53:54] anyways that's the ticket to follow for now, then there are the snapshot1001 -> php7 ones (maybe linked off that ticket) and the 'move dumps to php7/stretch' (also maybe linked on that ticket [17:53:56] ) [17:55:58] (03PS1) 10Hashar: Port mw-phpunit.sh to python [integration/quibble] - 10https://gerrit.wikimedia.org/r/421592 [18:01:44] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package of Blubber - https://phabricator.wikimedia.org/T190551#4076636 (10dduvall) [18:02:09] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package of Blubber - https://phabricator.wikimedia.org/T190551#4076646 (10dduvall) p:05Triage>03Normal [18:02:50] Time: 18.55 seconds, Memory: 336.00MB [18:02:51] hey phpunit without the database tests is quite fast :] [18:05:55] (03PS2) 10Hashar: Port mw-phpunit.sh to python [integration/quibble] - 10https://gerrit.wikimedia.org/r/421592 [18:09:02] no_justification https://gerrit.wikimedia.org/r/#/c/421593/ :) [18:10:51] im wondering do we want to do it on week days [18:10:58] where we can spot any problems with gc? [18:11:03] instead of weekend [18:11:14] (03PS3) 10Hashar: Port mw-phpunit.sh to python [integration/quibble] - 10https://gerrit.wikimedia.org/r/421592 [18:14:56] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Parser, 10Quibble: Parsertest fails: Namespace takes precedence over interwiki link (T53680) - https://phabricator.wikimedia.org/T190554#4076699 (10hashar) [18:32:54] (03PS1) 10Hashar: Allow skipping zuul and npm/composer [integration/quibble] - 10https://gerrit.wikimedia.org/r/421600 [18:34:02] (03CR) 10Legoktm: [C: 032] Use File::getDeclarationName to get the function name [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421439 (owner: 10Umherirrender) [18:34:36] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 hphp_invoke - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 287 bytes in 0.012 second response time [18:35:01] (03Merged) 10jenkins-bot: Use File::getDeclarationName to get the function name [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421439 (owner: 10Umherirrender) [18:35:58] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:36:03] (03CR) 10jenkins-bot: Use File::getDeclarationName to get the function name [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421439 (owner: 10Umherirrender) [18:54:29] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:55:34] 10Release-Engineering-Team (Kanban), 10Wikimedia-Apache-configuration, 10Patch-For-Review: Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850#4076882 (10demon) Bump. Can we get this resolved for the remaining nodes? [19:01:50] in https://integration.wikimedia.org/ci/job/search-mjolnir-tox-docker/236/consoleFull i'm getting "Native memory allocation (malloc) failed to allocate 68157440 bytes for committing reserved memory." from the jvm. Any way to find out how much mem its trying for and not getting? [19:03:14] also if thats a system running out, or is the docker container provided a certain memory allocation via docker? [19:15:25] we don't run docker with any special memory restrictions so it seems like it's probably the machine running out. https://github.com/wikimedia/integration-config/blob/master/jjb/macro-docker.yaml#L67-L74 [19:15:28] paladox: indeed, conf.d is a much better way to php.ini ;) [19:15:31] thanks [19:15:32] :) [19:15:38] twentyafterfour see -operations [19:15:43] it's being merged :) [19:16:05] sweet [19:16:06] so once we are on stretch we can remove the php ini for good :) [19:21:07] 10Release-Engineering-Team (Kanban), 10Wikimedia-Apache-configuration, 10Patch-For-Review: Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850#4076945 (10bd808) a:05demon>03None The cleanup command @chasemp used was `rm -fR /usr/local/lib/mediawiki-config &&... [19:24:35] 10Project-Admins, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Create project 'In-context-help-and-onboarding - https://phabricator.wikimedia.org/T190561#4076957 (10jmatazzoni) [19:34:06] 10Project-Admins, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Create project 'In-context-help-and-onboarding - https://phabricator.wikimedia.org/T190561#4076992 (10jmatazzoni) [19:35:01] 10Project-Admins, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Create project 'In-context-help-and-onboarding - https://phabricator.wikimedia.org/T190561#4076957 (10jmatazzoni) Hi @Aklapper. If this could be done sooner rather than later that would be helpful. Thanks! [19:49:08] Right now there are three VMs in a 'shutoff' state in deployment-prep: deployment-videoscaler01, deployment-puppetdb01, deployment-tmh01 [19:49:13] are those all on purpose somehow? [19:49:20] Or the result of a nova failure? [19:51:05] There are also 16 vms with puppet failures right now. Entropy is winning! [19:51:06] i thought those were deleted [19:51:41] moritzm i think deleted them [19:52:04] they are shutoff but not deleted [19:53:17] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4077084 (10Paladox) [19:53:20] 10Phabricator, 10Release-Engineering-Team (Someday), 10Operations, 10Patch-For-Review: Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#4077083 (10Paladox) 05Open>03Resolved [19:53:48] 10Phabricator, 10Release-Engineering-Team (Someday), 10Operations: Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#3965428 (10Paladox) [19:55:15] greg-g: I wonder if you could add a weekly look at http://shinken.wmflabs.org/problems?search=deployment to your weekly meeting? I can open another long-running bug about failures there but it would be great if someone-other-than-me noticed them. [19:56:37] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Phlogiston: Phlogiston reports don't have new data since mid-February - https://phabricator.wikimedia.org/T188149#4077102 (10JAufrecht) I believe I have it correctly importing on the dev server, handling both forms of edge transaction. A total of three t... [19:58:19] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10Paladox) [19:58:22] (03PS1) 10Umherirrender: Replace PHP_INT_MAX by $phpcsFile->numTokens [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421614 [19:58:51] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077123 (10Paladox) [19:58:57] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3948368 (10Paladox) [19:59:12] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10Paladox) p:05Triage>03High Inherriting status from parent task [19:59:28] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077126 (10mmodell) [19:59:31] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#4077127 (10mmodell) [20:00:59] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077108 (10mmodell) [20:01:18] 10Release-Engineering-Team (Kanban), 10Wikimedia-Apache-configuration, 10Patch-For-Review: Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850#4077131 (10Marostegui) @bd808 I don't think we use mediawiki-config for anything on those sanitarium hosts for anything,... [20:02:37] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2192875 (10Andrew) As of today there are 16 shinken alerts (most puppet but at least one disk warning) on this project, and three VMs that are shut down but not... [20:07:46] 10Release-Engineering-Team (Kanban), 10Wikimedia-Apache-configuration, 10Patch-For-Review: Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850#4077149 (10bd808) >>! In T187850#4077131, @Marostegui wrote: > @bd808 I don't think we use mediawiki-config for anything... [20:10:07] RECOVERY - Puppet errors on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [20:11:18] (03PS1) 10Umherirrender: Use File::getMethodProperties to get visibility [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421615 [20:16:45] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#4077168 (10mmodell) [20:17:44] 10Release-Engineering-Team (Kanban), 10Wikimedia-Apache-configuration, 10Patch-For-Review: Cleanup remaining WikipediaMobileFirefoxOS references - https://phabricator.wikimedia.org/T187850#4077173 (10Marostegui) Ah right, if it will get the directory back but just without the module, I'm sure that won't brea... [20:17:53] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#2840264 (10mmodell) I sent a test mail from phab2001 to @paladox, which was received. [20:20:04] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#4077192 (10mmodell) [20:20:11] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#4077193 (10mmodell) [20:20:14] no_justification: is your fancy new gerrit logo on commons somewhere? A naive search is not finding it [20:20:15] 10Phabricator, 10Release-Engineering-Team (Someday): Implement phabricator database clustering support - https://phabricator.wikimedia.org/T112776#4077191 (10mmodell) [20:20:30] Prolly not [20:20:41] (wasn't mine, you should know I can't art hah) [20:20:45] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10mmodell) [20:20:48] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#2842561 (10mmodell) [20:20:50] 10Phabricator, 10Release-Engineering-Team (Someday), 10Availability, 10WorkType-NewFunctionality: Configure phabricator clustering for daemons and repositories - https://phabricator.wikimedia.org/T143175#4077196 (10mmodell) [20:21:09] no_justification: I wanted to put it on https://www.mediawiki.org/wiki/Developer_access [20:21:17] 'tis in puppet [20:21:21] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10mmodell) [20:21:24] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#2842827 (10mmodell) [20:21:26] 10Phabricator, 10Release-Engineering-Team (Next), 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#4077211 (10mmodell) [20:21:29] 10Phabricator, 10RelEng-Archive-FY201718-Q1, 10Operations, 10Patch-For-Review: setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#4077212 (10mmodell) [20:21:57] 10Phabricator, 10Release-Engineering-Team (Next), 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#3246107 (10mmodell) [20:21:59] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#2843161 (10mmodell) [20:22:20] 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4077217 (10mmodell) p:05Triage>03Normal [20:24:07] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Availability, 10Patch-For-Review, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928#4077228 (10mmodell) [20:25:28] 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4077217 (10mmodell) [20:26:21] 10Phabricator, 10Release-Engineering-Team (Next), 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#4077238 (10Paladox) p:05Low>03Normal Changing priority to match the other tasks [20:31:00] RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [20:31:42] 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4077248 (10mmodell) [20:41:19] (03PS1) 10Umherirrender: Use File::findExtendedClassName to get extends name [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421622 [20:41:50] andrewbogott: that is something we're trying to resolve in a long conversation with ops (where "that" == "future of beta"). tl;dr: we can't keep up with the churn of ops/puppet in any reasonable manner and still do everything else we're supposed to do so we're talking about what to do. Sorry for the non-answer for now :( [20:42:25] greg-g: that makes sense. I assumed that all the breakages are because of upstream changes :) [20:42:54] (03PS2) 10Umherirrender: Use File::findExtendedClassName to get extends name [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421622 [20:42:56] :) indeed [20:43:05] (03CR) 10Umherirrender: "Patch Set 2: Fixed typo" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/421622 (owner: 10Umherirrender) [20:45:53] qn .. when does mw1.31 come out? [20:46:14] LTS i mean .. https://www.mediawiki.org/wiki/Release_notes [20:46:36] just trying to figure out VE <-> MW lts compatibility and what parsoid should do wrt breakage. [20:49:01] Usually around May [20:49:21] May/Nov has been our 6mo cadence lately [20:49:50] thanks. [21:02:22] no_justification: seems like Gerrit is reindexing with a single worker :D [21:02:34] and 1480 tasks to be done doh [21:02:43] (from: gerrit show-queue --by-queue --wide ) [21:03:06] Queue: Index-Batch [21:03:06] 1483 tasks, 1 worker threads [21:03:15] Yep [21:03:18] One thread [21:03:21] On purpose [21:03:27] oky :] [21:03:28] It overloads the db [21:03:40] and seems the timeout for SSH-Interactive-Worker is still super long [21:03:43] Project beta-code-update-eqiad build #198807: 04FAILURE in 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/198807/ [21:03:45] I think we could increate it to 2 no_justification as the db is no longer reindexing, what do you think? :) [21:03:53] there are some git-receive-pack idling since this european morning [21:04:06] (well no longer as in it has done it all, but it should not be reindexing everything again) [21:04:08] I'm fine with it being long [21:04:14] Long as it's not as long as it was [21:04:21] ok [21:04:37] And a full reindex happens sometimes... [21:04:41] I've seen it before [21:04:54] hasharAway: are you having problems or just noticing things? [21:04:56] no_justification: ahh you bumped the number of workers for SSH-Interactive-Worker neat [21:05:08] actually having troubles updating mediawiki/core [21:05:18] (over https) [21:05:20] Well I let it use system defaults which bumoed it but yes! [21:05:30] Different thread pool for http... [21:05:39] remote: internal server error [21:05:39] fatal: protocol error: bad pack header [21:05:59] what about gc mediawiki/core? [21:06:10] (i mean server side) like we did for puppet today [21:06:18] no_justification: you got me at ssh vs http pools :D [21:06:50] [2018-03-23 21:05:33,508] [HTTP-18468] WARN /r : Internal error during upload-pack from /srv/gerrit/git/mediawiki/core.git [21:06:50] org.eclipse.jgit.errors.MissingObjectException: Missing commit ffda98c828dd7e7082fcc447539eb9530178caf1 [21:06:51] :( [21:06:54] I just GCd everything an hour or two ago [21:07:06] ... [21:07:13] But... I do mw core like once a month [21:07:15] So... [21:07:19] oh [21:07:21] I thought we said to not gc anymore? [21:07:30] That was when there were bugs [21:07:35] Said bugs long since fixed [21:07:39] ^^ [21:07:55] well apparently it is not :D https://phabricator.wikimedia.org/rMWffda98c828dd7e7082fcc447539eb9530178caf1 is gone :D [21:08:22] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:08:23] Cuz wmf. 22 is dead [21:08:28] I delete old branches remember? [21:08:40] so gc get rid of the commit [21:08:55] then why does Gerrit still knows about it somehow? [21:09:03] Yes. Because it's unreferenced [21:09:17] I bet it's cuz your local clone requests that head [21:09:28] You should fetch with --prune [21:09:41] git remote prune origin # magic [21:09:43] yeah that is it [21:09:57] so I guess the client send all the heads it knows about with the sha1 [21:10:05] and Gerrit whine because it miss the sha1 [21:10:17] Yep. [21:10:36] and thus Gerrit is all fine \o/ [21:13:34] "git is hard" [21:13:42] Project beta-code-update-eqiad build #198808: 04STILL FAILING in 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/198808/ [21:14:17] hasharAway: were you looking locally or on deployment-tin? [21:14:33] and I am off for some sleep. Tomorrow I will watch the departure of the largest passenger ship ever ( https://en.wikipedia.org/wiki/MS_Symphony_of_the_Seas ) [21:14:50] greg-g: last time was wednesday i think [21:15:40] hasharAway: Generally, I'd add --prune to your fetches :) [21:15:55] I ask because it looks like that beta-code-update failure is the same cause [21:15:59] I actually set that in my ~/.gitconfig [21:16:00] 21:13:00 INFO:mwcore:cwd: /srv/mediawiki-staging/php-master [21:16:03] 21:13:00 INFO:mwcore:running: git pull [21:16:05] 21:13:03 remote: internal server error [21:16:07] 21:13:03 fatal: protocol error: bad pack header [21:19:47] !log gjg@deployment-tin:/srv/mediawiki-staging/php-master$ git remote prune origin [21:19:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:19:59] * [pruned] origin/wmf/1.31.0-wmf.22 [21:20:01] :) [21:20:07] no_justification: you broke everyone! :) [21:20:27] we could leave a global message when cloning [21:20:34] to always run git remote prune origin [21:21:18] Well, it only affects wmf.* branches really [21:21:33] And only wmf branches with commits unique to that branch at that [21:21:39] So...kinda infrequent [21:21:39] but since those are included with every clone... it affects everyone [21:22:10] It only affects you if you have the branch locally checked out [21:22:17] why would beta? [21:22:21] 10Phabricator (Upstream), 10Browser-Support-Internet-Explorer, 10Upstream: Numeric anchor in phabricator to link to a comment of a task does not work in IE11 - https://phabricator.wikimedia.org/T76629#808048 (10gh87) I clicked the anchor link to a comment on IE11. It works right now. I think the same goes fo... [21:22:22] * greg-g is confused on that part [21:22:30] It probably fetches all heads [21:22:38] idk. [21:22:55] Anyway, this is more a git thing than a gerrit thing... [21:23:09] You could replicate this with github easily enough [21:23:21] yeah [21:23:57] Yippee, build fixed! [21:23:58] Project beta-code-update-eqiad build #198809: 09FIXED in 57 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/198809/ [21:24:29] I wonder if the "fetch sha1 heads that I want" setting we'd looked at before for perf purposes would be useful... [21:25:13] cf: T103990 [21:25:13] T103990: Gerrit upload-pack send ALL references causing massive network I/O on common operations - https://phabricator.wikimedia.org/T103990 [21:26:17] for deployment-prep, I guess the update script would need to --prune as well [21:26:22] Yeah [21:26:26] We should generally use that [21:26:27] it should be straightforward to add it to [21:26:43] git config remote.origin.prune true [21:26:46] That works ^ [21:26:47] T103990 i havent looked at it in ages [21:26:59] one would need to test it out [21:27:27] also I think I once read a short RFC to enhence the git advertising protocol and allow filtering out some references (typically for Gerrit refs/changes/* ) [21:27:49] That was the idea with this [21:28:36] there is some infos on https://bugs.chromium.org/p/gerrit/issues/detail?id=175 eventually [21:28:43] got wontfixed :/ [21:30:27] uploadpack.hideRefs refs/changes/ [21:30:28] uploadpack.hideRefs refs/cache-automerge/ [21:30:28] uploadpack.allowtipsha1inwant = true [21:30:29] in theory [21:30:58] no_justification we could install https://gerrit-review.googlesource.com/admin/repos/plugins/motd [21:30:59] with the proper fix being Google patches to git to get reviewed and merged by upstream [21:31:08] anyway. I am off for the week-end! [21:31:10] and have git prune [21:41:19] 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4077423 (10Dzahn) Consider storing the information on wikitech wiki. Since there is wikitech-static which is a copy of that and kept... [21:46:09] 10Phabricator, 10Release-Engineering-Team, 10Operations: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4077453 (10Dzahn) a:03Dzahn [21:55:04] Ah, we can just set fetch.prune [21:55:10] Rather than remote.*.prune [22:02:45] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Update integration docker agents to stretch - https://phabricator.wikimedia.org/T190584#4077517 (10thcipriani) [22:04:23] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Package docker-ce for stretch - https://phabricator.wikimedia.org/T190585#4077531 (10thcipriani) [22:06:43] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Package minikube for stretch - https://phabricator.wikimedia.org/T190586#4077541 (10thcipriani) [22:07:48] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Package docker-ce for stretch - https://phabricator.wikimedia.org/T190585#4077553 (10thcipriani) [22:07:50] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Update integration docker agents to stretch - https://phabricator.wikimedia.org/T190584#4077552 (10thcipriani) [22:08:08] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Update integration docker agents to stretch - https://phabricator.wikimedia.org/T190584#4077517 (10thcipriani) [22:08:10] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Package minikube for stretch - https://phabricator.wikimedia.org/T190586#4077554 (10thcipriani) [22:13:40] PROBLEM - Host integration-slave-k8s-1012 is DOWN: CRITICAL - Host Unreachable (10.68.18.132) [22:28:04] no_justification wondering did you upload gerrit to archiva? :) [22:33:39] Nope didn't get to it [22:35:03] ok [22:36:37] 10Release-Engineering-Team (Kanban), 10Release Pipeline: install helm on integration agents - https://phabricator.wikimedia.org/T188934#4077629 (10thcipriani) [22:45:42] 10Phabricator, 10Release-Engineering-Team (Next), 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#4077639 (10mmodell) @Marostegui: Can you comment on how we should handle cross-dc queries for phabricator? More specifically, will there be problems when we... [22:46:50] paladox: I can upload them, but I don't wanna push it out this late on a friday [22:46:53] We can do it first thing Monday [22:47:02] ok [22:47:05] thanks :) [22:49:04] 10Phabricator, 10Release-Engineering-Team (Next), 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#4077655 (10Paladox) or @jcrespo ^^ please? [22:49:51] 10Phabricator, 10Release-Engineering-Team (Next), 10DBA: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#4077667 (10mmodell) [22:50:18] 10Release-Engineering-Team (Next), 10User-greg: Write retrospective of Jan 2018 team offsite - https://phabricator.wikimedia.org/T189927#4077668 (10greg) 05Open>03Resolved a:03greg https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Offsites/2018-01-Sonoma/Outputs I don't have time to manu... [23:46:07] 10Phabricator, 10Browser-Support-Internet-Explorer: Scrolling Workboard pages on Internet Explorer; also technical issues with the top horizontal scrollbar - https://phabricator.wikimedia.org/T190597#4077792 (10gh87) [23:50:29] 10Phabricator, 10Browser-Support-Internet-Explorer: Scrolling Workboard pages on Internet Explorer; also technical issues with the top horizontal scrollbar - https://phabricator.wikimedia.org/T190597#4077806 (10gh87) [23:55:40] 10Phabricator, 10Browser-Support-Internet-Explorer: Scrolling Workboard pages on Internet Explorer; also technical issues with the upper horizontal scrollbar - https://phabricator.wikimedia.org/T190597#4077808 (10gh87)