[00:02:46] <legoktm>	 @suppress?
[00:10:16] <SMalyshev>	 I don't want to @suppress without knowing the reason... maybe there's something wrong I am doing somewhere?
[00:10:39] <SMalyshev>	 it just fails to see a random class, while seeing other classes from the same extension...
[00:16:46] <legoktm>	 Ill take a better look once I'm off my phone
[00:17:50] <SMalyshev>	 thanks!
[01:24:59] <Krinkle>	 Hm.. unable to create an account on beta enwiki
[01:25:06] <Krinkle>	 keep getting 'session hijacking' no matter how many times I submit it
[02:39:13] <wikibugs>	 10Release-Engineering-Team, 10Scap, 10Operations: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10Legoktm)
[03:01:08] <wmf-insecte>	 Project mediawiki-core-code-coverage-docker build #3743: 04FAILURE in 1 min 7 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/3743/
[04:49:03] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Legoktm) https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/3743/console anot...
[04:49:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Legoktm)
[04:49:16] <wikibugs>	 10Continuous-Integration-Config, 10Quibble: Quibble: shasum check failed - https://phabricator.wikimedia.org/T201638 (10Legoktm)
[04:53:10] <legoktm>	 SMalyshev: figured it out, left a comment on your patch
[05:05:33] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10Legoktm)
[05:10:17] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10Legoktm) > What does the current outbound smtp config look like in Jenkins?  I bel...
[05:22:02] <wikibugs>	 (03CR) 10Legoktm: [C: 04-2] "Err, this should only be added to the make-wmf-branch list if it is planned to be deployed to Wikimedia wikis, which I don't think there i" [tools/release] - 10https://gerrit.wikimedia.org/r/458309 (owner: 10D3r1ck01)
[05:34:48] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10phan-taint-check-plugin: Configure CI to run phan-taint-check-plugin for MediaWiki core - https://phabricator.wikimedia.org/T203630 (10Legoktm)
[05:37:39] <wikibugs>	 (03CR) 10Legoktm: [C: 04-1] "Patch itself looks good (+1), we're just waiting on the extension patch to be ready :)" [integration/config] - 10https://gerrit.wikimedia.org/r/457109 (https://phabricator.wikimedia.org/T203364) (owner: 10MGChecker)
[05:53:09] <wikibugs>	 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219
[05:58:47] <wmf-insecte>	 Yippee, build fixed!
[05:58:48] <wmf-insecte>	 Project mediawiki-core-code-coverage-docker build #3744: 09FIXED in 1 hr 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/3744/
[06:35:35] <shinken-wm>	 PROBLEM - Puppet errors on deployment-deploy02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[07:10:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-deploy02 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:39:39] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin) a:03zeljkofilipin
[07:40:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin)
[07:41:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin) The only remaining thing to do is to review and merge [[ https://ger...
[07:55:24] <hashar>	 zeljkof: Hello :]  Have you managed to release the new mediawiki-wdio npm package ? :]
[07:56:00] <zeljkof>	 hashar: there's no need for it!
[07:56:06] <hashar>	 doh
[07:56:08] <hashar>	 how so?
[07:56:36] <zeljkof>	 Timo's javascript-fu is black belt, he suggested to solve the problem in a different way https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Popups/+/449485
[07:57:02] <zeljkof>	 hashar: this patch uses existing Api instead of EditPage that would need to be added to the package
[07:57:19] <hashar>	 and EditPage is just a wrapper using the api?
[07:57:38] <zeljkof>	 apparently, haven't looked in a while
[07:59:07] <hashar>	 00:11:53.546 WARNING: the "moveTo" command will be deprecated soon. If you have further questions, reach out in the WebdriverIO Gitter support channel (https://gitter.im/webdriverio/webdriverio).
[07:59:08] <hashar>	 00:11:53.546 Note: This command is not part of the W3C WebDriver spec and won't be supported in future versions of the driver. It is recommended to use the actions command to emulate pointer events.
[07:59:10] <hashar>	 but that is unrelated :]
[07:59:22] <zeljkof>	 hashar: we probably need to update wdio package
[07:59:31] <zeljkof>	 I've noticed there is  a newer version
[07:59:36] <hashar>	 +2ed
[07:59:38] <hashar>	 that is a deprecating warning in wdio
[07:59:46] <hashar>	 seems some spec has to be updated to use whatever new call
[07:59:54] <zeljkof>	 ah, popups is using something deprecated from wdio?
[07:59:59] <hashar>	 yeah
[08:00:09] <zeljkof>	 well, it's up to them
[08:00:09] <hashar>	 moveTo() is apparently to move the mouse pointer toward an element
[08:00:17] <hashar>	 which is not part of the WebDriver spec
[08:00:29] <hashar>	 anyway, the patch is merging
[08:00:34] <hashar>	 gotta reboot contint1001 now :]
[08:05:46] <zeljkof>	 hashar: huh quibble-vendor-mysql-hhvm-docker https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/None/console : EXCEPTION
[08:05:53] <zeljkof>	 from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Popups/+/449485
[08:08:27] <dcausse>	 same here: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/458316
[08:08:53] <zeljkof>	 dcausse: he did say this:
[08:08:56] <zeljkof>	 > gotta reboot contint1001 now :]
[08:08:58] <dcausse>	 is this the reboot?
[08:09:00] <dcausse>	 oh ok
[08:09:16] <zeljkof>	 so I guess that might be the cause, but I don't really know
[08:09:19] <dcausse>	 rechecking, thanks :)
[08:13:43] <hashar>	 zeljkof: yeah beccause we have restarted contint1001 / Zuul
[08:13:49] <hashar>	 not sure why it gives an EXCEPTION though
[08:21:04] <wmf-insecte>	 Yippee, build fixed!
[08:21:04] <wmf-insecte>	 Project beta-scap-eqiad build #221229: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221229/
[08:22:16] <hashar>	 zeljkof: Popups patch merged https://gerrit.wikimedia.org/r/#/c/449485/
[08:22:20] <hashar>	 so I guess the daily job will work now
[08:22:32] <hashar>	 also yesterday I have reverted the addition of the junit files processing
[08:22:40] <hashar>	 we will want to add that in a later patch
[08:23:14] <hashar>	 lets see what happens on https://integration.wikimedia.org/ci/job/selenium-daily-beta-Popups/11/console
[08:25:14] <wikibugs>	 (03CR) 10Hashar: [C: 032] spicerack: use the backport version of debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458303 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[08:26:54] <wikibugs>	 (03Merged) 10jenkins-bot: spicerack: use the backport version of debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458303 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[08:31:15] <zeljkof>	 hashar: argh 00:03:36.422 Error: reporter "wdio-junit-reporter" is not installed. Error: Error: Cannot find module 'wdio-junit-reporter'
[08:31:17] <zeljkof>	 fixing
[08:46:16] <hashar>	 :]
[08:46:44] <hashar>	 it passed codereview+2 since that runs from quibble/mediawiki/core
[08:46:47] <hashar>	 which has the junit reporter
[08:46:48] <hashar>	 bah
[09:02:39] <addshore>	 hashar: I want to set https://phabricator.wikimedia.org/T125976 to UBN, thoughts?
[09:03:03] <addshore>	 regarding running wikidata dispatching in beta, it appears whever the deployment hosts in beta were recreated (a month ago) they were not recreated in the same way :(
[09:03:14] <addshore>	 so dispatching hasn't been runnning on beta for a month
[09:03:23] <addshore>	 thoughts?
[09:05:37] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Icinga, 10Wikidata: Create an alarm for wikidata beta dispatching - https://phabricator.wikimedia.org/T203641 (10Addshore)
[09:06:07] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Icinga, 10Wikidata: Create an alarm for wikidata beta dispatching - https://phabricator.wikimedia.org/T203641 (10Addshore)
[09:06:53] <addshore>	 it could be an easy fix I think, there should just be some role to apply to a host for the wikidata dispatching
[09:15:58] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, 10User-zeljkofilipin: Run tests daily targeting beta cluster for all repositories with Selenium tests - https://phabricator.wikimedia.org/T188742 (10zeljkofilip...
[09:16:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin)
[09:19:27] <wikibugs>	 (03CR) 10Hashar: "Sorry I have forgot to deploy that change. It is live now!" [integration/config] - 10https://gerrit.wikimedia.org/r/458303 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans)
[09:31:15] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin) I'm unable to make a commit in Popups.  ``` $ git status On branch T...
[09:31:49] <wikibugs>	 10Continuous-Integration-Config, 10Operations: rspec-puppet fails with Could not find the daemon directory (tested [/etc/sv,/var/lib/service]) - https://phabricator.wikimedia.org/T203645 (10hashar)
[09:32:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10zeljkofilipin) @Niedzielski looks like you did NVM related changes to Popups recent...
[09:33:21] <wikibugs>	 10Continuous-Integration-Config, 10Operations: rspec-puppet fails with Could not find the daemon directory (tested [/etc/sv,/var/lib/service]) - https://phabricator.wikimedia.org/T203645 (10hashar)
[09:54:31] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10ArticlePlaceholder, 10Wikidata, and 2 others: ArticlePlaceholder should use MediaWiki qunit runner - https://phabricator.wikimedia.org/T180171 (10MarcoAurelio) @hashar So the reason why https://gerrit.wikimedia.org/r...
[10:12:39] <wikibugs>	 (03CR) 10D3r1ck01: "> Patch Set 1: Code-Review-2" [tools/release] - 10https://gerrit.wikimedia.org/r/458309 (owner: 10D3r1ck01)
[10:12:55] <wikibugs>	 (03Abandoned) 10D3r1ck01: Add SendGrid extension to release tool [tools/release] - 10https://gerrit.wikimedia.org/r/458309 (owner: 10D3r1ck01)
[10:22:38] <addshore>	 Also, hi zeljkof ! :D
[10:23:01] <addshore>	 how close to having the videos of browser test failures in jenkins CI are we? :D
[10:28:06] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188 (10Addshore) We have some more seemingly random failures going on with https://gerrit.wikimedia.org/r/c/mediawiki/...
[10:36:52] <zeljkof>	 addshore: it's working fine https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
[10:37:03] <zeljkof>	 just pending some changes that Timo requested
[10:37:23] <addshore>	 aaah, so if I set something as Depend-On that core change then the extension job will generate videos? :D
[10:37:35] <addshore>	 If so that is amazing and I'll do that if it continues to fail :D
[10:38:00] <zeljkof>	 addshore: yes, it should work, try it
[10:38:10] <addshore>	 will do, thanks!
[10:48:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%)
[10:55:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 845 integration-slave-docker-1026 (/: 100%): OFFLINE due to disk space
[11:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 846 integration-slave-docker-1026: OFFLINE due to disk space
[11:05:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 847 integration-slave-docker-1012 (/srv: 95%): OFFLINE due to disk space
[11:05:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 847 integration-slave-docker-1026: OFFLINE due to disk space
[11:08:13] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[11:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 848 integration-slave-docker-1012: OFFLINE due to disk space
[11:10:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 848 integration-slave-docker-1026: OFFLINE due to disk space
[11:10:32] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Math, 10Operations: quibble-vendor-mysql-hhvm-docker no space left on device, write - https://phabricator.wikimedia.org/T203649 (10Physikerwelt)
[11:15:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 849 integration-slave-docker-1012: OFFLINE due to disk space
[11:15:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 849 integration-slave-docker-1026: OFFLINE due to disk space
[11:20:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 850 integration-slave-docker-1012: OFFLINE due to disk space
[11:20:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 850 integration-slave-docker-1026: OFFLINE due to disk space
[11:25:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 851 integration-slave-docker-1012: OFFLINE due to disk space
[11:25:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 851 integration-slave-docker-1026: OFFLINE due to disk space
[11:30:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 852 integration-slave-docker-1012: OFFLINE due to disk space
[11:30:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 852 integration-slave-docker-1026: OFFLINE due to disk space
[11:34:50] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10phan-taint-check-plugin: Configure CI to run phan-taint-check-plugin for MediaWiki core - https://phabricator.wikimedia.org/T203630 (10Bawolff) Yes.  There are still quite a few false positives to sort out, but its starting to get more manageable
[11:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 853 integration-slave-docker-1012: OFFLINE due to disk space
[11:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 853 integration-slave-docker-1026: OFFLINE due to disk space
[11:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 854 integration-slave-docker-1012: OFFLINE due to disk space
[11:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 854 integration-slave-docker-1026: OFFLINE due to disk space
[11:45:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 855 integration-slave-docker-1012: OFFLINE due to disk space
[11:45:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 855 integration-slave-docker-1026: OFFLINE due to disk space
[11:50:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 856 integration-slave-docker-1012: OFFLINE due to disk space
[11:50:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 856 integration-slave-docker-1026: OFFLINE due to disk space
[11:55:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 857 integration-slave-docker-1012: OFFLINE due to disk space
[11:55:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 857 integration-slave-docker-1026: OFFLINE due to disk space
[12:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 858 integration-slave-docker-1012: OFFLINE due to disk space
[12:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 858 integration-slave-docker-1026: OFFLINE due to disk space
[12:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 859 integration-slave-docker-1012: OFFLINE due to disk space
[12:05:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 859 integration-slave-docker-1026: OFFLINE due to disk space
[12:08:47] <hashar>	 !log cleaned integration-slave-docker-1012 and integration-slave-docker-1026
[12:08:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 860 integration-slave-docker-1012: OFFLINE due to disk space
[12:15:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 861 integration-slave-docker-1012: OFFLINE due to disk space
[12:20:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 862 integration-slave-docker-1012: OFFLINE due to disk space
[12:22:12] <hashar>	 Disconnected by SYSTEM : maintenance-disconnect-full-disks build 847 (/srv: 95%)
[12:22:23] <hashar>	 but /srv has 3% usage after clean up ...
[12:23:12] <wikibugs>	 (03CR) 10Hashar: [C: 032] Add line continuation in command line to README [integration/config] - 10https://gerrit.wikimedia.org/r/457380 (owner: 10Brian Wolff)
[12:24:56] <wikibugs>	 (03Merged) 10jenkins-bot: Add line continuation in command line to README [integration/config] - 10https://gerrit.wikimedia.org/r/457380 (owner: 10Brian Wolff)
[13:07:10] <wikibugs>	 10Release-Engineering-Team, 10Scap: scap timeout checking index.php/api.php mwdebug1001 / mwdebug1002 - https://phabricator.wikimedia.org/T203664 (10hashar)
[13:10:03] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata: Voting integration of extensions requires breaking changes to be forced - https://phabricator.wikimedia.org/T203666 (10Pablo-WMDE)
[13:11:29] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar) There was again {T203566} which is "just" a cache issue as I understand it. It only happens when the new version is rolled and disappear after that onc...
[13:16:56] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar) There are some spikes of memcached errors, that is mc2** hosts being rebooted in codfw.
[13:19:16] <wikibugs>	 10Project-Admins: Create a Component project for ScienceSource - https://phabricator.wikimedia.org/T203667 (10Jkbr)
[13:22:10] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata, 10wikidata-tech-focus, 10User-Addshore: Voting integration of extensions requires breaking changes to be forced - https://phabricator.wikimedia.org/T203666 (10Addshore)
[13:22:41] <addshore>	 zeljkof: it worked a treat and allowed us to solve the problem in a few minuites :D
[13:23:38] <addshore>	 oh no, zeljkof hashar I guess the browser test videos are what killed those slaves though?
[13:23:57] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata, 10wikidata-tech-focus, 10User-Addshore: Voting integration of extensions requires breaking changes to be forced - https://phabricator.wikimedia.org/T203666 (10Pablo-WMDE)
[13:23:59] <addshore>	 https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/15230/ which had videos was on 1026
[13:24:52] <addshore>	 though the videos are pretty small
[13:26:36] <hashar>	 addshore: yeah the videos are fairly small
[13:26:51] <hashar>	 there are several disk spaces issues
[13:26:57] <hashar>	 the docker images are on the / partition
[13:27:08] <hashar>	 and that keeps filling up as we create new version of ci containers
[13:27:23] <hashar>	 eventually with time, all slaves have a copy of all docker containers used by any job
[13:27:30] <hashar>	 for the slave 1026
[13:27:34] <hashar>	 it has several executors
[13:27:44] <hashar>	 so eg you can have several copy of the same job running concurrently
[13:28:00] <hashar>	 each creating a new workspace dir that has a full clone of mediawiki left behind in $WORKSPACE/src
[13:28:15] <hashar>	 I have a patch floating to cleanup the workspace once the build is complete
[13:28:22] <hashar>	 I should test it and deploy it I guess
[13:53:49] <wikibugs>	 10Release-Engineering-Team, 10Scap: scap timeout checking index.php/api.php mwdebug1001 / mwdebug1002 - https://phabricator.wikimedia.org/T203664 (10thcipriani) I don't think it'd be the HHVM bytecode cache for promoting all wikis since 1.32.0-wmf.20 has been on wikis since Tuesday.  I do wonder if it's relate...
[14:05:47] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "deployed yesterday evening" [integration/config] - 10https://gerrit.wikimedia.org/r/458327 (owner: 10Thcipriani)
[14:09:14] <wikibugs>	 (03Merged) 10jenkins-bot: Maintenance: call .toString() explicitly [integration/config] - 10https://gerrit.wikimedia.org/r/458327 (owner: 10Thcipriani)
[14:11:33] <wikibugs>	 (03Abandoned) 10Thcipriani: Maintenance: calculate diskFull size per executor [integration/config] - 10https://gerrit.wikimedia.org/r/458064 (owner: 10Thcipriani)
[14:13:08] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10Patch-For-Review, 10User-zeljkofilipin: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10Niedzielski) > Binary files a/resources/dist/index.js.json an...
[14:19:10] <Amir1>	 thcipriani: hey, sorry to bother again but disabling LFS didn't work either
[14:23:01] <Amir1>	 I I found a "workaround" for now. ores itself can be cloned from phabricator instead
[14:25:12] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10greg)
[14:25:23] <thcipriani>	 Amir1: hrm I still see lfs as enabled for scoring/ores/ores http://tyler.zone/ores-lfs-options.png
[14:25:47] <Amir1>	 :/
[14:25:47] <thcipriani>	 I wonder if there is some caching that should be flushed or permission to reload
[14:26:16] <Amir1>	 let me make a patch, that might help
[14:27:37] <Amir1>	 thcipriani: fun thing is that the mirroring works in phabricator: https://phabricator.wikimedia.org/source/ores/
[14:27:49] <Amir1>	 (it's the updated version)
[14:28:17] <thcipriani>	 my understanding, which is still fuzzy, is that the mirroring to gerrit works via phabricator somehow?
[14:28:52] <thcipriani>	 or something
[14:29:34] <thcipriani>	 mirroring from github -> gerrit has worked for source/ores/ores before, correct?
[14:30:04] <Amir1>	 yup
[14:30:13] <Amir1>	 let me try something
[14:30:26] <Amir1>	 can I be added to https://phabricator.wikimedia.org/project/view/2624/ ?
[14:30:34] <Amir1>	 I can add myself but just to be sure
[14:31:25] <greg-g>	 done
[14:32:28] <Amir1>	 Thanks
[14:34:03] <Amir1>	 Yes, it works through github
[14:34:11] <Amir1>	 that explains a lot 
[14:34:16] <Amir1>	 *phabricator
[14:46:37] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10greg)
[14:51:25] <herron>	 Reedy: could we pick up where we left off yesterday on https://phabricator.wikimedia.org/T191438 ?
[14:51:41] <Reedy>	 Sure
[14:51:56] <herron>	 the ssh key issue should be fixed now, that user was needing to be added to the horizon project
[14:52:15] <herron>	 ssh works when testing from a shell on contint
[14:54:11] <Reedy>	 https://integration.wikimedia.org/ci/computer/compiler1001.puppet-diffs.eqiad.wmflabs/
[14:54:31] <Reedy>	 Looks like jenkins has set them up
[14:55:40] <herron>	 ah! excellent
[14:56:37] <Reedy>	 I dunno how likely it is to make jobs run on htem...
[14:56:51] <Reedy>	 But we could mark the old two offline to force them to be run
[14:57:30] <herron>	 looking at recent builds it looks like at least compiler1002 has ran a few
[14:57:58] <herron>	 so thats good.  yeah I think it makes sense to mark the old two offline
[15:00:15] <Reedy>	 Marked them offline
[15:03:36] <herron>	 sweet!
[15:06:31] <Reedy>	 They need removing properly when confirmed we don't need them anymore
[15:07:21] <herron>	 ok, will follow up on that soon
[15:08:05] <herron>	 also the reverse proxy running on compiler02 has to be moved over before deleting those instances
[15:18:55] <wikibugs>	 10Release-Engineering-Team, 10Cloud-VPS, 10Operations, 10puppet-compiler, and 3 others: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438 (10herron) Stretch compilers `compiler100[12].puppet-diffs.eqiad.wmflabs` are now live in the operations-puppet-catalog-compiler Jenkins pr...
[15:23:00] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes on HHVM (rather than ~5 on PHP 5) - https://phabricator.wikimedia.org/T191921 (10thcipriani) >>! In T191921#4558120, @Krinkle wrote: > [...] > Also remember that enabling JIT will speed things up...
[15:28:04] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: Scap should use Eval.Jit=1 when calling rebuildLocalisationCache.php via HHVM - https://phabricator.wikimedia.org/T203680 (10thcipriani) p:05Triage>03High
[15:51:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson)
[15:53:01] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) > @zeljkofilipin, are you using...
[16:00:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 906 integration-slave-docker-1026 (/: 99%): OFFLINE due to disk space
[16:01:12] <wikibugs>	 10MediaWiki-Releasing, 10AbuseFilter, 10MW-1.32-release: Bundle AbuseFilter extension with MW 1.32 - https://phabricator.wikimedia.org/T191740 (10Daimona) Looking at WMF config, these are my thoughts about default configuration:  # Regarding available actions (AbuseFilterActions), I think we should remove bl...
[16:04:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%)
[16:05:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 907 integration-slave-docker-1026: OFFLINE due to disk space
[16:10:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 908 integration-slave-docker-1026: OFFLINE due to disk space
[16:14:11] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[16:15:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 909 integration-slave-docker-1026: OFFLINE due to disk space
[16:20:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 910 integration-slave-docker-1026: OFFLINE due to disk space
[16:25:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 911 integration-slave-docker-1026: OFFLINE due to disk space
[16:29:05] <wikibugs>	 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042 (10Ladsgroup) The phabricator repos themselves are broken: ``` amsa@C235:~$ git clone http://phabricator.wikimedia.org/source/editquality.git...
[16:30:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 912 integration-slave-docker-1026: OFFLINE due to disk space
[16:32:36] <wikibugs>	 10Release-Engineering-Team, 10Readers-Web-Backlog, 10Epic, 10MobileFrontend (MobileFrontend.js): [EPIC] Generate compiled assets from continuous integration - https://phabricator.wikimedia.org/T158980 (10Jdlrobson)
[16:33:57] <thcipriani>	 !log mark integration-slave-docker-1026 back online after diskspace recovery
[16:34:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:45:10] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%)
[16:45:29] <wmf-insecte>	 maintenance-disconnect-full-disks build 915 integration-slave-docker-1026 (/: 98%): OFFLINE due to disk space
[16:50:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 916 integration-slave-docker-1026: OFFLINE due to disk space
[16:50:48] <ebernhardson>	 thcipriani: if you have a moment, i jus ttried to deploy a repo with git-fat artifacts from deployment.eqiad.wmnet and it failed. Looking at the remote server none of the git-fat artifacts were pulled down :( 
[16:50:54] <ebernhardson>	 (via scap3)
[16:51:11] <thcipriani>	 ebernhardson: which repo?
[16:51:17] <ebernhardson>	 thcipriani: search/mjolnir
[16:51:25] <ebernhardson>	 err, search/mjolnir/deploy
[16:51:52] <ebernhardson>	 thcipriani: i havn't rolledback yet, so stat1005.eqiad.wmnet (first server group deployed to) is still in the "wrong" state
[16:51:52] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10Krinkle) <https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-hh...
[16:53:02] <thcipriani>	 ebernhardson: I'm looking at the deploy log, I don't see git-fat as having run
[16:53:03] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson)
[16:53:26] <ebernhardson>	 thcipriani: yea i didn't see git-fat in there either.  scap/scap.cfg has ``git_binary_manager: git-fat` though
[16:53:27] <thcipriani>	 ebernhardson: (scap deploy-log -v in /srv/deployment/search/mjolnir/deploy)
[16:53:32] <thcipriani>	 hrm
[16:54:20] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Math, 10Operations: quibble-vendor-mysql-hhvm-docker no space left on device, write - https://phabricator.wikimedia.org/T203649 (10Umherirrender) Sounds like handled in T202457
[16:54:43] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10Krinkle)
[16:54:47] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Math, 10Operations: quibble-vendor-mysql-hhvm-docker no space left on device, write - https://phabricator.wikimedia.org/T203649 (10Krinkle)
[16:54:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10Krinkle)
[16:55:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Math, 10Operations: quibble-vendor-mysql-hhvm-docker no space left on device, write - https://phabricator.wikimedia.org/T203649 (10Krinkle)
[16:55:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10Krinkle) 05duplicate>03Open
[16:55:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 917 integration-slave-docker-1026: OFFLINE due to disk space
[16:55:12] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[16:55:20] <Krinkle>	 thcipriani: It seems 1026 is is the most common one to go full.
[16:55:27] <Krinkle>	 thcipriani: I wonder if there is something specific about that one?
[16:55:44] <Krinkle>	 If the space is just going to workspaces, and nothing is leaking or growing, maybe it just has too many executors / not enough disk.
[16:57:54] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Jdforrester-WMF)
[17:00:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 918 integration-slave-docker-1026: OFFLINE due to disk space
[17:00:55] <thcipriani>	 ebernhardson: did you try to sync this revision a few times?
[17:01:26] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-General-or-Unknown, 10MW-1.32-release-notes (WMF-deploy-2018-07-10 (1.32.0-wmf.12)), 10Patch-For-Review, and 2 others: `npm audit` for mediawiki/core found 24 vulnerabilities - https://phabricator.wikimedia.org/T194280 (10Jdforrester-WMF) State after https:/...
[17:01:40] <ebernhardson>	 thcipriani: only once today
[17:01:51] <ebernhardson>	 thcipriani: doh, actually i ctrl-c'd the first after ~5s
[17:01:57] <ebernhardson>	 realized i forgot to submodule update, then re-ran
[17:02:02] <thcipriani>	 I see: Revision directory already exists (use --force to override)
[17:02:07] <thcipriani>	 so that's why git-fat didn't run
[17:02:13] <thcipriani>	 if you try: scap deploy --force
[17:02:25] <thcipriani>	 it might solve the issue
[17:02:29] <ebernhardson>	 ok lemme try
[17:02:34] * thcipriani watches
[17:03:14] <thcipriani>	 saw git-fat happen anyway
[17:05:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 919 integration-slave-docker-1026: OFFLINE due to disk space
[17:05:16] <ebernhardson>	 thcipriani: yup, looks like it worked. I should remember to try --force more often 
[17:05:39] <thcipriani>	 ebernhardson: awesome! glad it worked :)
[17:06:02] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, 10User-zeljkofilipin: Run tests daily targeting beta cluster for all repositories with Selenium tests - https://phabricator.wikimedia.org/T188742 (10Jdlrobson)
[17:06:07] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) 05Open>03Resolved w00t! Thank y...
[17:06:35] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Krinkle) Yeah, sounds good to me!  * The urls are expected to be immutable. The main value...
[17:08:01] <thcipriani>	 Krinkle: yeah, 1026 does fall off quite a bit and self-recover. Not sure what's happening with that. I think it's one of the new instances (like 1025) with 5 executors.
[17:08:19] <Krinkle>	 This is definitely a recent issues though (< 1 month).
[17:08:29] <Krinkle>	 Would be good to not become the new status quo :)
[17:08:41] <thcipriani>	 I think this instance is < 1 month old :)
[17:08:54] <Krinkle>	 Thanks, I wasn't sure actually.
[17:09:11] <Krinkle>	 The new offline bot is pretty cool, that'll help with all other kinds of issues as well.
[17:09:36] <Krinkle>	 Although it's still recovery, not prevention. it still fails 5 builds each time it happens.
[17:09:46] <thcipriani>	 yeah :(
[17:10:03] <thcipriani>	 I think the instance was part of https://phabricator.wikimedia.org/T202160
[17:10:21] <Krinkle>	 Yeah, it was chosen based on ram/CPU, not disk space
[17:10:56] <thcipriani>	 might ultimately need an instance type more tuned to our use
[17:11:04] <Krinkle>	 I don't want to mess with the statistics marxarelli is collecting, but it seems like lowering # executors for a day or two might be worth trying
[17:12:05] <Krinkle>	 I wonder if we have an overall metric for builds failing/passing. That's gonna be noisy to some extent, but could be useful to see overall.
[17:12:53] <thcipriani>	 I don't think that was one of the metrics for the instance. The offline bot might already be skewing numbers in that regard
[17:13:04] <thcipriani>	 s/bot/job/
[17:13:42] <Krinkle>	 It should be reducing the number of failed jobs. Maybe. By offlining the node earlier.
[17:13:55] <Krinkle>	 Unless it only offlines when all current executors are full and all of them would fail.
[17:14:12] <thcipriani>	 right
[17:14:20] <thcipriani>	 I don't actually know if that's what's happening or not
[17:14:26] <thcipriani>	 but that would be my worry
[17:14:47] <Krinkle>	 it doesn't kill the jobs though, offline in Jenins means "don't start new jobs here"
[17:15:04] <Krinkle>	 they die naturally with space left errors, or succeed if it was offlined at the right time.
[17:15:10] <thcipriani>	 ah, ok
[17:15:18] <thcipriani>	 I wasn't sure
[17:16:36] <thcipriani>	 ah, 1026 has 8(!) executors
[17:17:18] <thcipriani>	 with 4 jobs running it's jumped from 60% usage on / to 80%
[17:18:33] <Krinkle>	 hashar: https://phabricator.wikimedia.org/T203583 is corrupting pages in a way we cannot revert, I thought the branch was reverted to previous group, but seems not yet?
[17:19:21] <marxarelli>	 yeah, i tuned it for cpu/mem efficiency
[17:19:49] <marxarelli>	 thcipriani: do we know what's taking up disk space mainly? and can we do some better clean up perhaps?
[17:19:55] <greg-g>	 Krinkle: I wasn't clear on revert worthiness, didn't know that bit about unrecoverable corrupting.
[17:20:01] <greg-g>	 can that be made clear on task?
[17:20:45] <thcipriani>	 marxarelli: I would imagine it's the mediawiki checkouts for each job. It's just a workdir since we use /srv/git as a reference repo, but still a single workdir is pretty huge
[17:23:00] <Krinkle>	 greg-g: done
[17:24:23] <thcipriani>	 marxarelli: also each running quibble container takes up a ton of space, I guess, looking at docker ps -s
[17:24:53] <thcipriani>	 which is probably why it self recovers even though we cleanup workspaces on subsequent job runs
[17:25:00] <marxarelli>	 thcipriani: ah, that's more likely the problem i would think, since it's the / partition that's so much smaller on that type
[17:25:33] <thcipriani>	 subsequent job runs that aren't scheduled since the maintenance disk-space job disconnects instances when they run out of space
[17:26:12] <marxarelli>	 and /var/lib/docker is where the container overlayfs resides, so that's on the / partition
[17:26:20] <thcipriani>	 docker-registry.wikimedia.org/releng/quibble-jessie-hhvm:0.0.23 -> 1.2GB (virtual 2.21GB)
[17:26:55] <greg-g>	 hashar: there?
[17:27:03] <hashar>	 greg-g: yes
[17:27:09] <greg-g>	 Krinkle: are you able to review/merge the fix from daniel?
[17:27:17] <greg-g>	 hashar: https://phabricator.wikimedia.org/T203583
[17:27:49] <Krinkle>	 greg-g: unfortunately, I can't. I'm totally behind on this area. Hoping someone from his team is around, and/or people(s) involved with the refactor that introduced the bug last week.
[17:30:17] <hashar>	 greg-g: I cant review a parser patch sorry :\\\
[17:30:38] <greg-g>	 yeah, we might need to rollback
[17:30:45] <greg-g>	 brad is reviewing, hopefully a backport shortly
[17:36:52] <Krinkle>	 The problem is undetectable even if we consider mutating immutable text storage, because substition is one-way, the fact that it was a subst isn't stored. Probably the first time subst: is broken in production since the feature's introduction 10 years ago.
[17:37:18] <RoanKattouw>	 greg-g: Another ping re the Labor Day / following Mondays issue. Would you mind me going in and fixing this on the wiki page?
[17:37:46] <greg-g>	 RoanKattouw: no minding from me, sorry, meeting-ful day so far today
[17:37:57] <thcipriani>	 !log integration-slave-docker-1026 sudo docker kill eae9ba3a1459 -> stopped a container that had been running for 5 hours
[17:37:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[17:38:00] <RoanKattouw>	 No worries, I figured you'd be unlikely to have time to do it yourself
[17:39:30] <Krinkle>	 commons will need to re-review 100s/1000s of files since yesterday, and the bots aren't stopping. can we rollback now?
[17:39:41] <greg-g>	 hashar: ^ please do the needful
[17:39:51] <greg-g>	 I guess rollback group1+2
[17:40:16] <Krinkle>	 sorry to be persistent, but I'm gonna be among the commons admins re-reviewing the files and every minute is adding more to the backlog :)
[17:40:57] <greg-g>	 thcipriani: can you do a quick rollback while hashar is afk?
[17:41:17] <thcipriani>	 greg-g: yep
[17:41:19] <greg-g>	 ty
[17:41:21] <Krinkle>	 I don't know what the damage is on other wikis, likely all sorts of random bot gadget and template based workflows. We'll want to announce on Tech News as well for people to know, because it's just a silent issue. It changes what you save.
[17:41:35] <hashar>	 sorry I got kids over excited and shouting around
[17:41:42] <hashar>	 I guess we can keep group0 for testing?
[17:41:45] * Krinkle posts on phab instead.
[17:41:49] <Krinkle>	 Thanks!
[17:41:55] <greg-g>	 hashar: correct
[17:42:04] <hashar>	 or just keep test0
[17:42:07] <hashar>	 thcipriani: thanks.
[17:42:55] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<11.11%)
[17:51:06] <marxarelli>	 ^ thcipriani, i'm looking at `docker ps -s` and `docker system df` and it seems that most space is taken up by images that could be removed
[17:51:12] <marxarelli>	 not running containers
[17:51:26] <RoanKattouw>	 OK, Monday schedules are fixed
[17:51:28] <marxarelli>	 this is on integration-slave-docker-1025 at least
[17:53:02] <thcipriani>	 marxarelli: I could believe it, although running images I think are what's causing self-recovery for instances out of space
[17:53:53] <marxarelli>	 i see
[17:56:34] <marxarelli>	 seems that we might need a lot more storage for /var/lib/docker on integration instances. images are cached there, anything written to / within running containers is stored there...
[17:57:16] <marxarelli>	 hashar and i had discussed the possibility of allocated a second lvm volume for /var/lib/docker. that or we could request any instance type with loads more / space
[17:57:39] <marxarelli>	 *an* instance type
[17:58:23] <thcipriani>	 hrm, not sure how the lvm volume thing works now, doesn't it just split up /  to create /srv?
[17:59:10] <marxarelli>	 it would split up `second-local-disk` which is currently used for /srv
[18:00:42] <hashar>	 the docker containers on / is definitely an issue
[18:02:08] <hashar>	 for /srv filling up,  I need to get the workspaces cleaned up after build.  https://gerrit.wikimedia.org/r/c/integration/config/+/457918  might do it but I havent tested it
[18:02:38] <marxarelli>	 thcipriani: for instance, on integration-slave-docker-1026 /dev/mapper/vd-second--local--disk is 140G, 14G currently allocated
[18:03:02] <marxarelli>	 by contrast, /dev/vda3 (/) is 19G, 12G currently allocated
[18:03:51] <marxarelli>	 so, if docker were configured to use the lvm storage driver, we could give it a pretty huge chunk of the vg
[18:04:14] <thcipriani>	 seems like it might fix the issues we're seeing.
[18:04:31] <thcipriani>	 why was it important for /srv to be a separate partition in the past?
[18:04:36] <thcipriani>	 I wonder.
[18:05:34] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10Jdforrester-WMF) I'm not sure T203661 should block the train. Cosmetic error which is irritating but not broken, and is self-correcting the next time the page...
[18:05:35] <marxarelli>	 thcipriani: note that the size of /dev/mapper/vd-second--local--disk varies depending on instance type
[18:05:54] <thcipriani>	 makes sense
[18:06:06] <marxarelli>	 um, well, i would think /srv was put on an lvm volume just to get more space
[18:06:24] <marxarelli>	 and to prevent job storage requirements from filling /
[18:06:33] <marxarelli>	 probably! :)
[18:06:40] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10greg) ack, thanks for the feedback, I was being overly cautious.
[18:06:52] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10greg)
[18:07:07] <thcipriani>	 but why /srv and not /var
[18:07:30] <marxarelli>	 oh. that i don't know. /var would be a better choice
[18:07:58] <marxarelli>	 but maybe it's tricky since puppet configures the lvm volume
[18:08:33] <marxarelli>	 it would have to stop the world, create the volume, mount it elsewhere, copy /var contents over, remount it at /var
[18:08:48] <marxarelli>	 maybe that's just too tricky to get right in puppet
[18:08:59] <ebernhardson>	 hmm, it seems recently `--tmpfs /tmp` was added to tox ci jobs, but that doesn't set the execute bit on /tmp so my jni library fails to load right :(
[18:09:33] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891 (10Jrbranaa) @Aklapper, I've been spending a bit of time on this and one of...
[18:10:16] <wikibugs>	 10Continuous-Integration-Config, 10Operations, 10Patch-For-Review: rspec-puppet fails with Could not find the daemon directory (tested [/etc/sv,/var/lib/service]) - https://phabricator.wikimedia.org/T203645 (10hashar) I went with a monkey patch in rspec-puppet https://github.com/rodjek/rspec-puppet/pull/720
[18:14:04] <marxarelli>	 hashar: see ebernhardson's comment ^. when was --tmpfs introduced? i don't see it in integration/config master
[18:14:58] <marxarelli>	 oh. https://gerrit.wikimedia.org/r/c/integration/config/+/457070
[18:15:29] <ebernhardson>	 marxarelli: i'm still testing, it looks like the dir gets mounted as `tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)`, that noexec means jvm can't copy a .so into /tmp and load it
[18:16:07] <ebernhardson>	 i might be able to repoint the jvm temp directory, still looking. Alternatively /tmp could be mounted with full arguments instead of the `--tmpfs` which would allow enabling exec
[18:16:59] <marxarelli>	 the change was deployed but not yet merged. i'm not sure we want `--tmpfs` being used until we have an instance type with more memory backing the nodes
[18:17:15] <wikibugs>	 (03CR) 10EBernhardson: "looks like this broke mjolnir jobs.  /tmp gets mounted as noexec and as part of loading JNI libraries the JVM copies .so files from source" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[18:17:20] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10Krinkle)
[18:17:29] <marxarelli>	 ebernhardson: you should comment on https://gerrit.wikimedia.org/r/c/integration/config/+/457070 if the change is causing problems so hashar can roll it back
[18:17:47] <marxarelli>	 and hashar, i think we'll need more memory per node before we can roll that out, yes?
[18:18:19] <hashar>	 marxarelli: not sure
[18:18:48] <hashar>	 kunal is progressively updating jobs to avoid a mass havoc
[18:19:13] <hashar>	 i guess we can refresh the tox mjolnir job using the version of master
[18:19:14] <marxarelli>	 i think it would be wise to measure /tmp usage during/after job execution first
[18:19:20] <marxarelli>	 to know what the memory impact will be
[18:19:47] <hashar>	 anyway gotta read some stories then have dinners
[18:19:54] <hashar>	 safest for now is to rollback that specific job
[18:21:18] <ebernhardson>	 hashar: thanks, sorry to always be doing odd things that break
[18:26:58] <Krinkle>	 Could a phabricator admin give me edit rights for https://phabricator.wikimedia.org/transactions/editengine/maniphest.task/view/46/ ? I'd like to re-use this form and link it from the wikimedia-log-errors sidebar (currently it doesn't use the form)
[18:28:27] <wikibugs>	 10Release-Engineering-Team (Kanban): Add Code Stewardship review - https://phabricator.wikimedia.org/T203698 (10Jrbranaa) p:05Triage>03Normal
[18:29:13] <wikibugs>	 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Jrbranaa) a:03Jrbranaa
[18:31:48] <wikibugs>	 10Release-Engineering-Team (Kanban): Add Services (and other non-extensions) to the deployment review process - https://phabricator.wikimedia.org/T203701 (10Jrbranaa) p:05Triage>03Normal
[18:35:40] <wikibugs>	 (03PS1) 10Krinkle: Only run patch-coverage on master branch (as originally intended) [integration/config] - 10https://gerrit.wikimedia.org/r/458553
[18:36:00] <wikibugs>	 10Release-Engineering-Team: TEC13:O1.1:Q1 Goal - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page) - https://phabricator.wikimedia.org/T199253 (10Jrbranaa)
[18:36:10] <wikibugs>	 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Jrbranaa)
[18:36:12] <wikibugs>	 (03CR) 10Jforrester: [C: 031] "Oops." [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[18:36:30] <wikibugs>	 (03CR) 10Krinkle: "Noticed it running at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Flow/+/458549/" [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[18:37:12] <wikibugs>	 10Release-Engineering-Team (Kanban): TEC13:O1.1:Q1 Goal - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page) - https://phabricator.wikimedia.org/T199253 (10Jrbranaa) a:03Jrbranaa
[18:38:24] <wikibugs>	 10Release-Engineering-Team (Kanban): TEC13:O1.1:Q1 Goal - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page) - https://phabricator.wikimedia.org/T199253 (10Jrbranaa)
[18:38:25] <wikibugs>	 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Jrbranaa)
[18:46:34] <wikibugs>	 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Jrbranaa)
[18:47:51] <legoktm>	 ebernhardson: :/ well, glad we did a slow rollout. Lemme revert it now
[18:48:15] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10Jdforrester-WMF)
[18:48:56] <legoktm>	 !log reverted tmpfs change for search-mjolnir-tox-docker
[18:48:57] <wikibugs>	 10Release-Engineering-Team (Kanban): TEC13:O1.1:Q1 Goal - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page) - https://phabricator.wikimedia.org/T199253 (10Jrbranaa)
[18:48:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[18:49:55] <wikibugs>	 (03CR) 10Legoktm: [C: 04-1] "> Patch Set 6:" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[18:51:32] <wikibugs>	 (03CR) 10Legoktm: [C: 031] Only run patch-coverage on master branch (as originally intended) [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[18:51:50] <wikibugs>	 (03CR) 10Krinkle: [C: 032] Only run patch-coverage on master branch (as originally intended) [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[18:53:18] <wikibugs>	 (03Merged) 10jenkins-bot: Only run patch-coverage on master branch (as originally intended) [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[18:54:51] <Krinkle>	 !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/458553
[18:54:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[19:12:00] <wikibugs>	 10Continuous-Integration-Config, 10Math, 10MediaWiki-Installer, 10Quibble, and 2 others: MathHooks: table creation yields warnings on quibble - https://phabricator.wikimedia.org/T202266 (10Physikerwelt) p:05High>03Normal
[19:28:28] <wikibugs>	 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Add Shell linting to heritage repo - https://phabricator.wikimedia.org/T175906 (10Lokal_Profil) 05Open>03Resolved Boldly resolving
[19:30:30] <thcipriani>	 Krinkle: twentyafterfour made you an admin, so you can edit that form, FYI
[19:31:34] <Krinkle>	 thcipriani: thx!
[19:31:42] <Krinkle>	 twentyafterfour: thanks :)
[19:31:56] * thcipriani doffs cap
[19:32:17] <wikibugs>	 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Add Shell linting to heritage repo - https://phabricator.wikimedia.org/T175906 (10JeanFred) Yeah, I did not find how to  integrate Shellcheck but bashate is good enough for now :) Thanks!
[20:12:14] <greg-g>	 I was wondering what the solution to that (form edit) was, I guess admin. More admins (we trust) is good I suppose!
[20:16:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) Thanks!  Looks like it would indeed be affected by mx1001 downtime.  We sh...
[20:27:46] <wikibugs>	 10Release-Engineering-Team, 10Cloud-VPS, 10Operations, 10puppet-compiler, and 3 others: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438 (10Krenair) >>! In T191438#4563158, @herron wrote: > #cloud-vps is there anything else involved in moving a web proxy from one project to a...
[21:12:20] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Cache, 10Patch-For-Review, 10Technical-Debt: BagOStuff should detect obsolete serialization or an unserialization resulting in a "wrong" object - https://phabricator.wikimedia.org/T156541 (10hashar)
[21:13:31] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Cache, 10Patch-For-Review, 10Technical-Debt: BagOStuff should detect obsolete serialization or an unserialization resulting in a "wrong" object - https://phabricator.wikimedia.org/T156541 (10hashar) That has hit us when deploying 1.32.0-wmf.20 (T203566).
[21:14:11] <paladox>	 this is what the blue header with white texts looks like: https://phabricator.wikimedia.org/F25681227
[21:34:02] <wikibugs>	 (03CR) 10Hashar: "Maybe we can just mount with the exec flag?" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[21:35:15] <Krinkle>	 paladox: logo is lost in that color mix (not allowed per brand guidelines)
[21:35:44] <paladox>	 at least it looks better :) (though we could stick black on the logo)
[21:36:12] <mutante>	 use color picker on foundation site
[21:36:24] <mutante>	 to find one that doenst conflict with brand guidelines 
[21:37:39] <paladox>	 heh
[21:41:31] <wikibugs>	 (03CR) 10Krinkle: "Yeah, making it executable should be fine. But, I also agree with Lego that we probably do need a way to not use ram for all of /tmp in al" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[21:42:23] <Krinkle>	 mutante: :|
[21:43:26] <mutante>	 well.. i asked specifically and they said "people who wrote style guide disagree" so not even trolling a lot
[21:46:34] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10hashar) I have edited the package.json and ran...
[21:46:57] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, 10User-zeljkofilipin: Run tests daily targeting beta cluster for all repositories with Selenium tests - https://phabricator.wikimedia.org/T188742 (10hashar) ht...
[21:47:13] <wikibugs>	 10Gerrit, 10Patch-For-Review: Place holder task for Gerrit 2.16 upgrade - https://phabricator.wikimedia.org/T200739 (10Paladox)
[22:00:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<11.11%)
[22:00:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 978 integration-slave-docker-1026 (/: 96%): OFFLINE due to disk space
[22:01:46] <Krinkle>	 greg-g: indeed, the form system is kind of phacinating. The form to edit the task creation forms is itself also an editable form.
[22:02:07] <Krinkle>	 https://phabricator.wikimedia.org/transactions/editengine/maniphest.task/view/46/ -> Configure Form -> https://phabricator.wikimedia.org/transactions/editengine/maniphest.task/edit/46/
[22:02:24] <Krinkle>	 but Configure Form (again) -> https://phabricator.wikimedia.org/transactions/editengine/transactions.editengine.config/view/5/
[22:02:45] <Krinkle>	 So I looked there if there is an "Edit Policy" field to make visible, but there isn't.
[22:02:57] <Krinkle>	 It doesn't have that feature. So it's implicitly admin only I guess.
[22:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 979 integration-slave-docker-1026: OFFLINE due to disk space
[22:10:06] <wikibugs>	 (03CR) 10Hashar: "Sorry, I broke it when switching to the -docker job :-(" [integration/config] - 10https://gerrit.wikimedia.org/r/458553 (owner: 10Krinkle)
[22:10:11] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[22:18:45] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) IIRC, a while ago (like in 2012) it was configured to use localhost for re...
[22:21:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) Sorry I forgot, @herron is there a smarthost on all of our servers or does...
[23:24:36] <wikibugs>	 (03PS6) 10Krinkle: Remove `composer dump-autoload --optimize` [integration/jenkins] - 10https://gerrit.wikimedia.org/r/394907 (https://phabricator.wikimedia.org/T181940) (owner: 10Reedy)
[23:24:47] <wikibugs>	 (03CR) 10Krinkle: [C: 032] Remove `composer dump-autoload --optimize` [integration/jenkins] - 10https://gerrit.wikimedia.org/r/394907 (https://phabricator.wikimedia.org/T181940) (owner: 10Reedy)
[23:25:37] <wikibugs>	 (03Merged) 10jenkins-bot: Remove `composer dump-autoload --optimize` [integration/jenkins] - 10https://gerrit.wikimedia.org/r/394907 (https://phabricator.wikimedia.org/T181940) (owner: 10Reedy)