[00:50:35] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-prometheus01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[00:52:46] <wikibugs>	 10Gerrit, 10Phabricator: Disable Tjlsangria Gerrit and Phabricator accounts - https://phabricator.wikimedia.org/T147165#2683347 (10Legoktm)
[04:06:03] <wmf-insecte>	 Yippee, build fixed!
[04:06:03] <wmf-insecte>	 Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #160: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/160/
[09:25:20] <elukey>	 hashar: o/
[09:25:32] <elukey>	 do you have a minute to help me with Jenkins and the puppet compiler?
[09:25:55] <elukey>	 the slave seems not reachable and I am a bit ignorant about the magic to restore it :)
[09:32:27] <hashar>	 elukey: sure in half an hour or so
[09:32:54] <elukey>	 thank you!
[09:42:24] <shinken-wm>	 PROBLEM - Host deployment-db2 is DOWN: CRITICAL - Host Unreachable (10.68.17.94)
[09:43:22] <shinken-wm>	 PROBLEM - Host deployment-db1 is DOWN: CRITICAL - Host Unreachable (10.68.16.193)
[09:47:01] <hashar>	 elukey: still in audio. Can you meanwhile fill a bug for the puppet compiler?  Phabricator should have a #puppet-compiler tag
[09:47:08] <hashar>	 + continuous-integration-infrastructure
[09:50:03] <elukey>	 hashar: ah yes but I wanted to ask you if there was some daemon to restart on the host to make Jenkins work again, maybe it doesn't need a phab task
[09:50:13] <elukey>	 it seem that Jenkins is not able to re-launch the slave
[09:54:34] <hashar>	 well
[09:54:39] <hashar>	 if in doubt fill a task :D
[09:54:53] <hashar>	 they are cheap and spam all interested people 
[09:55:00] <hashar>	 at least puppet pass on the instance \o/
[09:55:24] <hashar>	 zeljkof: https://integration.wikimedia.org/ci/computer/compiler02.puppet3-diffs.eqiad.wmflabs/ is offline
[09:55:27] <hashar>	 sounds familiar ?
[09:55:40] <hashar>	 we had a slave locked last week
[09:56:08] <hashar>	 yteah same issue
[09:56:13] <hashar>	 ssh slave plugin being crazy
[09:56:27] <godog>	 I just provisioned deployment-poolcounter03 with jessie for T123734 though can't login yet, I suspect because puppet is broken?
[09:56:52] <hashar>	 godog: labs instance creation is broken afaik
[09:57:13] <godog>	 hashar: ah! is there a task to follow?
[09:57:18] <hashar>	 a weird conditions of LDAP badly provisionned, puppet master being off, SSL cert not up-to-date etc
[09:57:38] <hashar>	 yeah there must be a task. Something like   new jessie instance labs 
[09:57:40] <hashar>	 perhaps
[09:57:50] <hashar>	 godog: maybe it is fixable via salt ?
[09:58:30] <godog>	 hashar: perhaps, no idea what the problem is atm
[09:58:43] <hashar>	 deployment-salt02.deployment-prep.eqiad.wmflabs 
[09:58:45] <hashar>	 might save you
[09:58:56] <godog>	 but no time to go into the rabbit hole now, I'll let it be for now
[10:02:39] <hashar>	 elukey: ok the Jenkins slave is back :)
[10:02:42] <elukey>	 it seems that setting the host offline and trying to bring it back in jenkins did something
[10:02:45] <hashar>	 elukey: had to kill a bunch of Java threads
[10:02:51] <elukey>	 ahhhh you did something! 
[10:02:52] <elukey>	 :P
[10:02:59] <hashar>	 there is some wierd java deadlock inthe code somewhere
[10:03:09] <elukey>	 but on the compiler or on gallium?
[10:03:11] <hashar>	 and one can see the java threads in https://integration.wikimedia.org/ci/monitoring?part=threads
[10:03:18] <hashar>	 then randomly kill threads that are BLOCKED and with a name related
[10:03:33] <hashar>	 that is on the Jenkins master side
[10:03:53] <hashar>	 it happened on another slave last week 
[10:04:02] <elukey>	 ah okok, thanks a lot!
[10:04:24] <hashar>	 andI have zero ideas how to debug/write java :D
[10:04:54] <hashar>	 godog: 2016-10-03T10:03:22.873729+00:00 deployment-poolcounter03 puppet-agent[523]: Could not request certificate: getaddrinfo: Name or service not known
[10:05:07] <hashar>	 godog: got it by asking for the console output  on Wikitech
[10:05:13] <hashar>	 should be available in Horizon as well
[10:05:26] <godog>	 hashar: yeah, I got the same from horizon
[10:05:34] <hashar>	 most probably salt works
[10:05:45] <hashar>	 so one can sed /etc/puppet.conf and or /etc/resolv.conf to fix it up
[10:07:44] <hashar>	 salt not available bah
[10:08:31] <godog>	 sure I can hammer my way through, I'll report it instead as I got something not working through no fault of mine I think
[10:11:48] <zeljkof>	 hashar: a slave offline again (just saw that)
[10:11:50] <zeljkof>	 ?
[10:12:49] <hashar>	 zeljkof: yeah same issue as last week. Jenkins ssh plugin being deadlocked
[10:12:57] <zeljkof>	 argh
[10:13:05] <hashar>	 I have just killed threads :D
[10:13:10] <hashar>	 we really gotta upgrade Jenkins
[10:13:17] <zeljkof>	 jenkins is going crazy again
[10:13:19] <hashar>	 anyway laundry duty
[10:13:24] <hashar>	 err
[10:13:25] <hashar>	 is it ?
[10:13:33] <zeljkof>	 no, I mean, locking slaves
[10:13:47] <zeljkof>	 upgrade before or after migration?
[10:13:52] <hashar>	 after
[10:13:56] <shinken-wm>	 PROBLEM - Host deployment-poolcounter03 is DOWN: CRITICAL - Host Unreachable (10.68.19.250)
[10:14:00] <hashar>	 I dont want to mess with several things at the same time :D
[10:14:47] <godog>	 that's me btw, trying to recreate poolcounter04 now
[10:21:56] <godog>	 !log add role::prometheus::node_exporter to classes in hiera:deployment-prep T144502
[10:22:00] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:28:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-poolcounter04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:33:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684318 (10BBlack) It would probably be better to upgrade the deployment-prep upload cache to varnish4.
[13:37:54] <hashar>	 !log marked integration-slave-trusty-1014 offline. Cant run job / get stuck somehow
[13:37:58] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:39:20] <hashar>	 !log integration-slave-trusty-1014  upgrading packages, clean up and rebooting it
[13:39:23] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:41:00] <hashar>	 !log Tip of the day: to reboot an instance and bypass molly-guard: /sbin/reboot
[13:41:05] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:43:07] <godog>	 poolcounter failures is me btw, fixing
[13:43:22] <hashar>	 !log Added integration-slave-trusty-1014  back in the pool
[13:43:26] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:58:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:00:53] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684352 (10AlexMonk-WMF) >>! In T147116#2684318, @BBlack wrote: > It would probably be better to upgrade the deployment-prep upload cache to varnish4.  Okay...
[14:06:07] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684357 (10BBlack) I wish :)  The basic flow we're using on prod nodes is here, but some of that's inapplicable to deployment-prep: https://wikitech.wikimed...
[14:07:47] <grrrit-wm>	 (03PS6) 10Hashar: Rename builders tox and doxygen [integration/config] - 10https://gerrit.wikimedia.org/r/313308 (owner: 10Paladox)
[14:15:18] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Rebased, did an additional replacement." [integration/config] - 10https://gerrit.wikimedia.org/r/313308 (owner: 10Paladox)
[14:16:06] <grrrit-wm>	 (03Merged) 10jenkins-bot: Rename builders tox and doxygen [integration/config] - 10https://gerrit.wikimedia.org/r/313308 (owner: 10Paladox)
[14:27:12] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/313306 (owner: 10Paladox)
[14:30:57] <paladox>	 hashar thankyou for merging :)
[14:41:22] <grrrit-wm>	 (03PS3) 10Hashar: Replace deprecated logrotate with build-discarder [integration/config] - 10https://gerrit.wikimedia.org/r/313306 (owner: 10Paladox)
[14:41:52] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "Some jobs lack the BuildDiscarderProperty :(" [integration/config] - 10https://gerrit.wikimedia.org/r/313306 (owner: 10Paladox)
[14:42:02] <hashar>	 paladox: will look at the build discarder one :D
[14:42:05] <hashar>	 it is not ready yet
[14:42:09] <paladox>	 Oh
[14:42:27] <hashar>	 I am looking at it
[14:44:23] <hashar>	 paladox: yeah the fix is easy.   Some jobs have "properties"  which override completely the value from the default
[14:44:29] <hashar>	 JJB does not inherit/merge properties
[14:44:33] <paladox>	 Oh
[14:44:36] <paladox>	 Thanks
[14:45:57] <paladox>	 hashar guessing that is a bug in zuul?
[14:50:11] <grrrit-wm>	 (03PS4) 10Hashar: Replace deprecated logrotate with build-discarder [integration/config] - 10https://gerrit.wikimedia.org/r/313306 (owner: 10Paladox)
[14:51:25] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "I have added the discardbuild property on jobs that had a more specific properties section. JJB does not merge the values from the defaul" [integration/config] - 10https://gerrit.wikimedia.org/r/313306 (owner: 10Paladox)
[14:51:26] <paladox>	 That seems to be a bug, if it dosen't inherit and merge
[14:51:40] <hashar>	 JJB does not inherit/merge properties
[14:52:01] <paladox>	 Oh
[14:52:03] <hashar>	 the thing is sometime you want to override, other times you want to merge :]
[14:52:08] <paladox>	 Yep
[14:52:10] <hashar>	 anyway
[14:52:15] <hashar>	 that is updating most jobs so gotta baby sit it
[14:52:24] <paladox>	 Yep, updating most jobs
[15:08:42] <wikibugs>	 06Release-Engineering-Team, 10MediaWiki-General-or-Unknown, 06Operations, 10Traffic, and 5 others: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2471564 (10BBlack) Is there more to do here on the MW-Core side of things?
[15:14:21] <andrewbogott>	 Beta people:  Can anyone clarify for me what determines which hosts have base::firewall applied and which don't?
[15:16:38] <paladox>	 hashar hi would you be able to review https://gerrit.wikimedia.org/r/#/c/313213/ and https://gerrit.wikimedia.org/r/#/c/313230/ please
[15:16:39] <paladox>	 ?
[15:16:46] <paladox>	 first link is adding a php7 pipeline
[15:17:04] <Krenair>	 andrewbogott, doesn't it depend on whether they have a role that includes it?
[15:17:08] <paladox>	 second is for reusing php lint code
[15:17:25] <andrewbogott>	 Krenair: yes — my question is why some do and some don't
[15:17:44] <andrewbogott>	 (It might be a perfectly reasonable mirror or prod, or it might be haphazard, I can't tell.)
[15:17:45] <grrrit-wm>	 (03PS3) 10Paladox: Reuse phplint code in job-template.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/313230 
[15:17:52] <grrrit-wm>	 (03PS2) 10Paladox: Add php7 pipeline for zuul [integration/config] - 10https://gerrit.wikimedia.org/r/313213 (https://phabricator.wikimedia.org/T144872) 
[15:17:58] <Krenair>	 some don't have roles that include it?
[15:18:17] <Krenair>	 are you looking into a specific one?
[15:19:55] <andrewbogott>	 hm...
[15:20:10] <grrrit-wm>	 (03PS2) 10Paladox: Add composer-php70 as a experimental test to mediawiki/core [integration/config] - 10https://gerrit.wikimedia.org/r/309556 (https://phabricator.wikimedia.org/T144961) 
[15:20:21] <andrewbogott>	 mostly I'm trying to figure out if it makes sense to enforce a "only roles at top-level node definition" and base::firewall is a prime offender
[15:20:37] <andrewbogott>	 but it's equally so it prod and in beta so I'll see about changing it in prod first and see who complains :)
[15:20:54] <grrrit-wm>	 (03PS5) 10Paladox: Update the mediawiki core tests to also test against php7 [integration/config] - 10https://gerrit.wikimedia.org/r/313223 (https://phabricator.wikimedia.org/T144964) 
[15:21:50] <hasharAway>	 andrewbogott: Krenair: I think we used to have ferm rules applied on beta cluster due to the role classes
[15:21:54] <hasharAway>	 and required base::firewall on them
[15:22:03] <hasharAway>	 maybe try to drop it on an instance and see what happens? :(
[15:22:04] <grrrit-wm>	 (03CR) 10Paladox: "@hashar https://phabricator.wikimedia.org/T137770 was resolved and I think arcanist is available for trusty now." [integration/config] - 10https://gerrit.wikimedia.org/r/295976 (owner: 1020after4)
[15:22:12] <hasharAway>	 gotta rush out though sorry :(
[15:22:23] <andrewbogott>	 hasharAway: I don't think it's necessarily incorrect to have it applied; just trying to understand.
[15:23:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:25:17] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684616 (10AlexMonk-WMF) 05Open>03Resolved a:03AlexMonk-WMF >>! In T147116#2684357, @BBlack wrote: > I wish :)  Yeah I knew you were gonna say that....
[15:26:21] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684621 (10BBlack) I think we can abandon the patch.  We're assuming we're past the point of reverting to varnish3 for the upload caches at this point, just...
[15:28:34] <grrrit-wm>	 (03PS5) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/307543 
[15:31:11] <grrrit-wm>	 (03PS6) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/307543 
[15:31:53] <shinken-wm>	 PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[15:34:13] <grrrit-wm>	 (03CR) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer-non-voting (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[15:35:13] <paladox>	 ejegg hi, im wondering if you could fix this test https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/7054/console please?
[15:35:34] <paladox>	 So that we can change it from non voting to voting in https://gerrit.wikimedia.org/r/#/c/307543/
[15:35:35] <paladox>	 please
[15:35:36] <paladox>	 ?
[15:36:30] <ejegg>	 Hi! I can definitely take a look at it
[15:37:36] <paladox>	 ejegg thankyou :)
[15:38:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:40:55] <paladox>	 ejegg i think it is not loading this https://github.com/wikimedia/mediawiki-extensions-DonationInterface/blob/30c1339790649bed1410e4a0e4306298bae88a5b/tests/phpunit/TestConfiguration.php#L53 file
[15:41:12] <paladox>	 so it dosen't know it is an alias to TestingGlobalCollectAdapter
[15:41:55] <ejegg>	 right, it must not be.  I just need to deal with an icinga alert about one of the last old queues, then I'll take a look at that define
[15:42:43] <paladox>	 Ok, thanks
[15:48:00] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2684681 (10thcipriani) Fixed up https://gerrit.wikimedia.org/r/310719 based on some review from @Volans.  Additional review is welcome! I think this...
[15:49:32] <paladox>	 ejegg maybe related https://phabricator.wikimedia.org/T142121 (when you finished what your doing :))
[15:53:12] <paladox>	 ejegg i tryed it on REL1_27 and resulted in https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/7056/console success
[15:53:24] <paladox>	 so looks like a change either in the extension or MW 1.28 broke it
[15:53:45] <paladox>	 I am going to try something by moving the test files into a sub folder to evade the internal change
[16:04:58] <paladox>	 ejegg i found a fix https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/7059/console
[16:04:59] <paladox>	 :)
[16:05:04] <paladox>	 https://gerrit.wikimedia.org/r/#/c/313822/1
[16:05:14] <paladox>	 (need to update commit msg) But was testing a fix
[16:05:19] <paladox>	 But other errors now show
[16:05:48] <ejegg>	 oh cool, thanks for looking into it!
[16:06:20] <paladox>	 Your welcome
[16:06:33] <paladox>	 ejegg it shows Undefined index: pageLanguage now
[16:11:52] <shinken-wm>	 RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:13:39] <wikibugs>	 03Scap3: scap version flag - https://phabricator.wikimedia.org/T147155#2684779 (10mmodell) gotta watch out for scap2 commands which use --version to refer to a mediawiki version.
[16:13:47] <paladox>	 ejegg oh it passes https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/7060/console now
[16:13:55] <paladox>	 After another check experimental :)
[16:14:48] <paladox>	 I updated https://gerrit.wikimedia.org/r/#/c/313822/ commit msg now :)
[16:14:51] <paladox>	 ejegg ^^ :)
[16:18:41] <grrrit-wm>	 (03CR) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer-non-voting (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[16:19:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-prometheus01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[16:25:34] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [3600.0]
[16:35:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Delete deployment-db1 and deployment-db2 - https://phabricator.wikimedia.org/T147110#2684845 (10dduvall) a:03dduvall
[16:38:45] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 13Patch-For-Review, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2684858 (10dduvall)
[16:38:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Delete deployment-db1 and deployment-db2 - https://phabricator.wikimedia.org/T147110#2684857 (10dduvall) 05Open>03Resolved
[16:49:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:19:32] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[17:21:24] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2685129 (10AlexMonk-WMF)
[17:39:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2685207 (10KartikMistry) This is happening due to cherry-pick of https://gerrit.wikimedia.org/r/#/c/308679/ which is for testing before deployment in Produ...
[18:24:21] <shinken-wm>	 PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42)
[18:37:52] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[18:55:59] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[19:02:13] <grrrit-wm>	 (03PS1) 10Aude: Update Wikidata branch - wmf/1.28.0-wmf.21 [tools/release] - 10https://gerrit.wikimedia.org/r/313861 
[19:12:52] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:30:58] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:59:11] <cscott>	 anyone here understand jenkins?  i mean, in a really intimate way?
[19:59:11] <cscott>	 jenkins thinks there's a conflict in some "cross-repo dependencies" in https://gerrit.wikimedia.org/r/313876 -- i didn't even know jenkins had cross-repo dependency management
[19:59:12] <cscott>	 i'd love to know what repos got crossed with mine
[19:59:12] <cscott>	 and how to uncross them
[20:02:09] <paladox>	 hasharAway ^^
[20:03:26] <hasharAway>	 yeah hmm ?
[20:03:31] <hashar>	 cscott: looking
[20:04:25] <hashar>	 cscott: somehow it cant merge https://gerrit.wikimedia.org/r/#/c/313876/ against the tip of the branch.  Let me check the logs
[20:04:57] * hashar logs to scandium.eqiad.wmnet and inspect /var/log/zuul/*.log
[20:05:01] <cscott>	 it also happened with the previous patch to this repo, IIRC.  i had to force-push to solve.
[20:05:52] <hashar>	 GitCommandError: 'git clone -v ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/extensions/Collection/OfflineContentGenerator /srv/ssd/zuul/git/mediawiki/extensions/Collection/OfflineContentGenerator' returned with exit code 128
[20:05:52] <hashar>	 stderr: 'fatal: destination path '/srv/ssd/zuul/git/mediawiki/extensions/Collection/OfflineContentGenerator' already exists and is not an empty directory.
[20:05:52] <hashar>	 :(
[20:07:05] <hashar>	 paladox: there is no point in hitting recheck
[20:07:13] <paladox>	 oh sorry
[20:07:39] <cscott>	 so there's just some old files that need to be cleaned up?
[20:07:49] <hashar>	 names conflict
[20:08:07] <hashar>	 mediawiki/extensions/Collection   having a directory named OfflineContentGenerator/
[20:08:28] <hashar>	 which clash when trying to clone mediawiki/extensions/Collection/OfflineContentGenerator
[20:09:38] <cscott>	 hm, that was never a problem in the past
[20:09:55] <cscott>	 but in fact extensions/Collection/OfflineContentGenerator appears to be completely unused, so that could be an easy workaround
[20:10:40] <paladox>	 Is that another bug in zuul, that it dosent support The same named repo even if it does Repo-name/sub-repo
[20:10:41] <paladox>	 ?
[20:11:36] <paladox>	 awight hi, could you merge https://gerrit.wikimedia.org/r/#/c/313822/ please?
[20:11:53] <paladox>	 Fixes the unit tests so i can make the composer voting instead of non voting?
[20:14:06] <hashar>	 cscott: I am retrying
[20:14:36] <hashar>	 cscott: basically I have created the repo manually using git init && git fetch && git symbolic-ref refs/remotes/origin/HEAD
[20:14:52] <hashar>	 it is definitely an issue in Zuul
[20:14:59] <hashar>	 but it happens once per quarter or so 
[20:15:58] <hashar>	 cscott: there is some jscs issues, not sure how much you care about them
[20:16:31] <hashar>	 cscott: also I have overhauled the OCG Grafana board with a few more metrics https://grafana.wikimedia.org/dashboard/db/ocg
[20:16:55] <cscott>	 i can fix the jscs issues
[20:17:15] <cscott>	 the ocg grafana board is very nice!  i was trying to build a kibana dashboard today.
[20:17:28] <awight>	 paladox: awesome, much appreciated.
[20:18:01] <hashar>	 cscott: the queue size change is a copy paste from the graph I have build on one of the mw jobrunner board
[20:18:18] <hashar>	 and I have looked at OCG source to figure out some other potentially interesting metrics
[20:18:27] <hashar>	 but really. I have no idea whether they make sense :D
[20:18:36] <hashar>	 that looks nice though
[20:18:58] <paladox>	 awight your welcome :)
[20:20:35] <grrrit-wm>	 (03PS7) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/307543 
[20:20:47] <paladox>	 awight i've updated ^^ now :)
[20:20:51] <paladox>	 thanks for merging too
[20:20:52] <paladox>	 :)
[20:22:45] <awight>	 paladox: btw, my availability is about to take a nose-dive from already unimpressive heights :)  -- if you want my team's attention for any of the CI issues you've been helping with, try #wikimedia-fundraising
[20:28:51] <grrrit-wm>	 (03CR) 10Awight: [C: 04-1] "I don't think we're quite ready--the mediawiki-extensions-hhvm test still fails:" [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[20:29:18] <hashar>	 I am reviewing this one
[20:31:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-be01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:33:25] <grrrit-wm>	 (03CR) 10Paladox: "@Awight this switches the test to extension-unittests-composer" [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[20:33:43] <paladox>	 hashar which one?
[20:33:59] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "I am not sure whether we should use composer to ship dependencies. For extensions deployed at wikimedia, we ship dependencies solely via " [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[20:34:26] <hashar>	 awight: paladox so for Donation Interface,  I think we should remove the generic job.  I dont think it has much use
[20:34:48] <paladox>	 hashar oh, so we doint want to use the generic test
[20:34:51] <hashar>	 I mean the job " mwext-testextension-hhvm-non-voting "
[20:34:56] <paladox>	 Oh
[20:35:03] <paladox>	 Yeh
[20:35:17] <hashar>	 it uses mediawiki/vendor @ master
[20:35:20] <paladox>	 Already done in https://gerrit.wikimedia.org/r/#/c/307543/7/zuul/layout.yaml
[20:35:22] <hashar>	 which is missing dependencies for sure
[20:35:32] <hashar>	 should use  mediawiki/vendor@fundraising/REL1_27
[20:35:37] <paladox>	 Yep, 
[20:35:38] <hashar>	 which is what the other job is doing
[20:35:53] <paladox>	 So i should remove - name: extension-unittests-non-voting
[20:36:06] <paladox>	 Since this mwext-donationinterfacecore-REL1_27-testextension-zend55 is already doing it?
[20:39:34] <awight>	 paladox: hashar: ah this brings up something else--we realized last week that we won't migrate any fundraising boxes to HHVM any time this year.  We only need to test on Zend PHP 5.3 and 5.5
[20:40:03] <paladox>	 Oh, yep
[20:40:24] <grrrit-wm>	 (03PS1) 10Paladox: [DonationInterface] Remove test extension-unittests-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/313890 
[20:40:26] <paladox>	 hashar awight ^^
[20:40:49] <grrrit-wm>	 (03CR) 10Hashar: "Second thought!" [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox)
[20:41:13] <hashar>	 awight: :(
[20:41:23] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [DonationInterface] Remove test extension-unittests-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/313890 (owner: 10Paladox)
[20:41:36] <hashar>	 definitely aim at phasing out Zend 5.3  though.  Precise is gone in May or June 2017.
[20:42:00] <paladox>	 :)
[20:42:19] <paladox>	 hashar it seems better to do https://gerrit.wikimedia.org/r/307543
[20:42:27] <hashar>	 awight: we should sit down with FR team at some point and rethink the jobs running for your repos
[20:42:29] <paladox>	 Since it tests against master too :0
[20:42:38] <paladox>	 +1
[20:42:46] <awight>	 hashar: I would love to
[20:43:04] <hashar>	 awight: I think Ejegg and Tyler paired about it
[20:43:11] <hashar>	 same TZ, might be a good lead
[20:44:09] <hashar>	 or bring it to a list :]
[20:44:12] <hashar>	 I am going to bed
[20:53:30] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:08:19] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2686048 (10Andrew)
[21:10:10] <andrewbogott>	 twentyafterfour: the beta::deployaccess class says up top 'remove this if https://phabricator.wikimedia.org/T121721  is fixed.'
[21:10:13] <andrewbogott>	 It is — can I really remove it?
[21:11:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:24:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:28:31] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:50] <twentyafterfour>	 andrewbogott: I think so
[21:31:16] <andrewbogott>	 twentyafterfour: if I pull all the references out of ldap, will you be around for a bit to notice if things go haywire?
[21:31:25] <thcipriani>	 andrewbogott: one caveat, the deploy-service user is a local user on deployment-tin/mira
[21:31:33] <thcipriani>	 and that's what we have the exception for
[21:31:48] <thcipriani>	 does that matter for the fix?
[21:32:07] <andrewbogott>	 thcipriani:  …understand what you're saying would require me to understand that class a lot more than I understand it now.  I'm just going by the fact that it says it can be removed.
[21:32:57] <twentyafterfour>	 this is for mwdeploy
[21:33:02] <twentyafterfour>	 oh hmm
[21:33:10] <thcipriani>	 sure, I just don't know what the fix was, so I can't say. If the fix relies on users being known to ldap, then it won't work in this instance.
[21:33:21] <thcipriani>	 (I'm pretty sure)
[21:33:36] <twentyafterfour>	 can't we create service users in ldap via wikitech?
[21:34:09] <twentyafterfour>	 andrewbogott: what references are in ldap that you mentioned above?
[21:34:57] <andrewbogott>	 twentyafterfour: 14 hosts in deployment-prep have that class included, via the wikitech UI (which is ldap)
[21:35:15] <twentyafterfour>	 ohh
[21:35:39] <andrewbogott>	 So if I'm going to remove the class I need to remove the class from those node definitions first.
[21:35:47] <andrewbogott>	 else their puppet runs will fail
[21:35:49] <cscott>	 it's a little late in my deploy window, but i'm about to deploy an updated OCG (which among other things will stop our en.wiktionary.org DoS)
[21:36:09] <twentyafterfour>	 it looks like you can remove it since https://gerrit.wikimedia.org/r/#/c/286852/2/modules/scap/manifests/target.pp seems to implement the same thing
[21:36:38] <thcipriani>	 ah, yup, that should work :)
[21:36:39] <twentyafterfour>	 andrewbogott: ^ from my reading of that patch it should be ok to remove beta::deployaccess
[21:36:41] <cscott>	 !log starting OCG deploy
[21:36:45] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:36:59] <andrewbogott>	 twentyafterfour: ok, we'll see what happens :)
[21:37:15] <twentyafterfour>	 andrewbogott: cool
[21:37:26] <cscott>	 um, did the ssh host key for deployment-tin.eqiad.wmflabs change?
[21:37:42] <twentyafterfour>	 cscott: I think so, somewhat recently
[21:37:48] <cscott>	 The fingerprint for the ECDSA key sent by the remote host is SHA256:2GreQk3ZeEOCHBzfQA0fzxSUrS8LOlNb7L0TZyU0pLY. 
[21:38:08] <andrewbogott>	 twentyafterfour: done
[21:38:08] <Krenair>	 it was recreated recently
[21:38:49] <andrewbogott>	 twentyafterfour: can I get a +1 on https://gerrit.wikimedia.org/r/#/c/313903/ ?
[21:39:22] <twentyafterfour>	 andrewbogott: done
[21:39:28] <andrewbogott>	 thanks
[21:40:18] <wmf-insecte>	 Project beta-scap-eqiad build #122775: 04FAILURE in 2 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122775/
[21:42:42] <cscott>	 !log updated OCG to version 0bf27e3452dfdc770317f15793e93e6e89c7865a
[21:42:45] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:44:01] <thcipriani>	 hrm, do the mediawiki servers have the scap::target class?
[21:45:39] <thcipriani>	 it would appear not
[21:45:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2686205 (10Andrew)
[21:45:46] <thcipriani>	 > pam_access(sshd:account): access denied for user `mwdeploy' from `deployment-tin.deployment-prep.eqiad.wmflabs
[21:46:08] <thcipriani>	 which is causing the beta-scap-eqiad failure
[21:46:49] <wmf-insecte>	 Project beta-scap-eqiad build #122776: 04STILL FAILING in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122776/
[21:47:08] <thcipriani>	 andrewbogott: twentyafterfour ^ may need to revert and figure a slightly different solution.
[21:47:35] <andrewbogott>	 ok… let me know
[21:48:40] <wmf-insecte>	 Project beta-scap-eqiad build #122777: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122777/
[21:48:51] <andrewbogott>	 that bot is really upset
[21:49:00] <thcipriani>	 heh, yeah
[21:49:17] <thcipriani>	 andrewbogott: let's revert that patch for now, and we'll work on something that will make that less upset :)
[21:49:39] <andrewbogott>	 ok
[21:49:49] <thcipriani>	 that fix only works for scap3 on beta, but now mediawiki update on beta is broken :\
[21:50:03] <thcipriani>	 (since mediawiki is not on scap3 yet)
[21:50:05] <andrewbogott>	 so, do you happen to know the set of VMs that need the class applied?
[21:52:05] <thcipriani>	 could figure it out, they're listed in /etc/dsh/group/{mediawiki-*,scap-*}
[21:53:13] <thcipriani>	 but scap::target is a define that is used to setup specific services on a target so we can't just apply it to the broken nodes, seemingly
[21:53:14] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2686255 (10Andrew)
[21:55:45] <andrewbogott>	 thcipriani: the class is back available in puppet now
[21:56:12] <andrewbogott>	 My agenda here is T147233, trying to not have instances apply non-role classes directly.
[21:56:33] <andrewbogott>	 So if you want to wrap that class up in a role, or rewrite it as a role, I can review and merge.
[21:56:50] <wmf-insecte>	 Project beta-scap-eqiad build #122778: 04STILL FAILING in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122778/
[21:57:13] <thcipriani>	 oh! I have to reapply it to the machines, eh?
[21:58:00] <twentyafterfour>	 can we define a scap::target for mediawiki?
[21:58:12] <andrewbogott>	 well, ideally we would be applying via the horizon UI, but that would require it to be a role
[21:58:48] <thcipriani>	 eh, well, currently scap::target would try to install mediawiki using the scap3 provider...
[21:59:27] <twentyafterfour>	 thcipriani: what if the /srv/deployment/mediawiki was just a dummy repo for now (we will need it soon enough once we start building the combined mediawiki repo)
[21:59:51] <grrrit-wm>	 (03PS1) 10Paladox: [XenForoAuth] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/313924 
[22:00:34] <thcipriani>	 twentyafterfour: I don't know, probably would be fine.
[22:01:35] <twentyafterfour>	 weeellll
[22:04:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:05:54] <thcipriani>	 !log reapplied beta::deployaccess to mediawiki servers
[22:05:58] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:06:08] <thcipriani>	 beta-scap-eqiad should recover shortly
[22:06:45] <wmf-insecte>	 Project beta-scap-eqiad build #122779: 04STILL FAILING in 1 min 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122779/
[22:07:20] <thcipriani>	 ahem
[22:16:39] <andrewbogott>	 thcipriani, twentyafterfour, do we need to re-open https://phabricator.wikimedia.org/T121721 or is something else happening now?
[22:16:42] <wmf-insecte>	 Project beta-scap-eqiad build #122780: 04STILL FAILING in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122780/
[22:17:07] <thcipriani>	 andrewbogott: I have a patch in process that should allow the removal of beta::deployaccess
[22:17:16] <andrewbogott>	 ah, great, thank you.
[22:20:42] <wikibugs>	 06Release-Engineering-Team, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2686313 (10jcrespo) I have done #3 of T146673#2675083.  The stopwords handling requires a patch to ada...
[22:23:33] <wikibugs>	 06Release-Engineering-Team, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2686315 (10Paladox) @jcrespo yeh,but https://gerrit.wikimedia.org/r/313235 should improve things more...
[22:26:51] <wmf-insecte>	 Project beta-scap-eqiad build #122781: 04STILL FAILING in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122781/
[22:28:40] <wmf-insecte>	 Project beta-scap-eqiad build #122782: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122782/
[22:30:51] <thcipriani>	 andrewbogott: https://gerrit.wikimedia.org/r/#/c/313927/ would that work for what you're trying to do?
[22:31:09] <thcipriani>	 (just wraps what's there in a role instead of a standalone thing)
[22:31:26] <andrewbogott>	 thcipriani: Sure.
[22:32:09] <andrewbogott>	 want me to go ahead and merge it?  You'll need to change the references again :(
[22:32:35] <thcipriani>	 yeah, I can do that, not that many hosts.
[22:32:56] <thcipriani>	 merge at will.
[22:33:51] <andrewbogott>	 done — thanks!
[22:34:53] <thcipriani>	 okie doke, I'll add it to the mw hosts in deployment-prep and try it out.
[22:35:36] <wikibugs>	 06Release-Engineering-Team, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2686338 (10Paladox) @jcrespo Hi, I'm wondering would you be able to do the patch that create an table...
[22:36:40] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2686339 (10Andrew)
[22:36:49] <wmf-insecte>	 Project beta-scap-eqiad build #122783: 04STILL FAILING in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122783/
[22:40:00] <thcipriani>	 !log manual rebase on deployment-puppetmaster:/var/lib/git/operations/puppet
[22:40:04] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:42:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[22:46:46] <wmf-insecte>	 Yippee, build fixed!
[22:46:47] <wmf-insecte>	 Project beta-scap-eqiad build #122784: 09FIXED in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/122784/
[22:47:08] <thcipriani>	 yippee
[22:47:27] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[22:48:07] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50638 bytes in 0.045 second response time
[22:48:57] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50638 bytes in 0.042 second response time
[22:50:07] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 929 bytes in 0.054 second response time
[22:50:18] <jdlrobson>	 is anyone exploring what's wrong with the beta cluster greg-g ? Or should I look into that? 
[22:50:19] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2236 bytes in 0.055 second response time
[22:51:48] <thcipriani>	 I'm not sure what's wrong, there is one message in logstash that's blown up: Fatal error: Cls: Expected string or object in /srv/mediawiki/php-master/includes/libs/rdbms/loadbalancer/LoadBalancer.php on line 218
[22:52:05] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50638 bytes in 0.042 second response time
[22:53:29] <Reedy>	 Probably poke Aaron
[22:55:28] <thcipriani>	 just did so in -operations
[22:57:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:02:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:04:47] <greg-g>	 hey, sorry, was in a 3 hour long meeting/training
[23:20:04] <wmf-insecte>	 Project beta-update-databases-eqiad build #11806: 04FAILURE in 3.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11806/
[23:34:49] <wikibugs>	 10Beta-Cluster-Infrastructure, 10MediaWiki-General-or-Unknown: LoadBalancer fatals on Beta cluster rendering pages inaccessible - https://phabricator.wikimedia.org/T147240#2686432 (10Jdlrobson)
[23:50:13] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2686487 (10Jdforrester-WMF)
[23:50:25] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2623702 (10Jdforrester-WMF)
[23:50:27] <wikibugs>	 10Beta-Cluster-Infrastructure, 10MediaWiki-General-or-Unknown: LoadBalancer fatals on Beta cluster rendering pages inaccessible - https://phabricator.wikimedia.org/T147240#2686492 (10Krenair) p:05Triage>03Unbreak! Might be something to do with https://gerrit.wikimedia.org/r/#/c/310757/ ? Not sure if this a...
[23:51:54] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2686499 (10Krenair)
[23:51:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 10MediaWiki-General-or-Unknown: LoadBalancer fatals on Beta cluster rendering pages inaccessible - https://phabricator.wikimedia.org/T147240#2686498 (10Krenair)