[01:23:14] <Krinkle>	 https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/76076/console
[01:23:14] <Krinkle>	 stderr: 'fatal: unable to access 'https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Thanks/': gnutls_handshake() failed: A TLS packet with unexpected length was received.'
[01:35:29] <twentyafterfour>	 Krinkle: hmm, weird
[01:39:37] <twentyafterfour>	 Krinkle: rebuilding the job didn't have that problem
[01:39:42] <twentyafterfour>	 I guess it was a fluke
[02:27:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:07:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:19:31] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #143: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/143/
[05:23:39] <wikibugs>	 10Browser-Tests-Infrastructure, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2642754 (10zeljkofilipin) a:03zeljkofilipin
[05:24:28] <wikibugs>	 10Browser-Tests-Infrastructure, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin) I vaguely remember seeing a similar error while testing...
[05:26:16] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[05:34:55] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-db03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[06:01:13] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[07:49:11] <wikibugs>	 10Deployment-Systems, 10MediaWiki-extensions-WikimediaMaintenance, 06Operations, 13Patch-For-Review: WikimediaMaintenance refreshMessageBlobs: wmf-config/wikitech.php requires non existing /etc/mediawiki/WikitechPrivateSettings.php - https://phabricator.wikimedia.org/T140889#2642949 (10elukey)
[08:27:06] <grrrit-wm>	 (03CR) 10Zfilipin: [C: 032] [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox)
[08:28:07] <grrrit-wm>	 (03Merged) 10jenkins-bot: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox)
[08:31:50] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642970 (10hashar) So looks like the root...
[08:50:05] <elukey>	 hashar: o/
[08:50:49] <elukey>	 should we wait to reimage the jobrunner since we'd need to terminate/re-create it ?
[08:50:56] <elukey>	 I checked and it is doing some stuff
[08:51:09] <elukey>	 but probably redis can keep the jobs for a while 
[08:51:22] <elukey>	 it shouldn't take too much to replace deployment-jobrunner01.deployment-prep.eqiad.wmflabs
[08:51:55] <hashar>	 you can try :]
[08:52:02] <hashar>	 not sure whether we got any prod jobrunner moved to jessie
[08:52:06] <hashar>	 but that might just work
[08:52:17] <hashar>	 tmh01 / video scaling  would be probably a bit more difficult
[08:52:37] <hashar>	 anyway, have to deal with aftermath of wmf.19  deploy from yesterday, so I am not sure how much cycles I will get today to assist
[08:52:45] <hashar>	 but ask as needed, will reply as I can :]
[08:52:55] <grrrit-wm>	 (03CR) 10Zfilipin: "The commit is deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox)
[08:53:20] <elukey>	 hashar: we definitely have jessie jobrunners in prod now, not videoscalers (Moritz is working on it)
[08:54:37] <hashar>	 elukey: yeah so it will probably be fine
[08:54:48] <hashar>	 the jobrunner service is in the repo mediawiki/services/jobrunner
[08:54:53] <hashar>	 that is deployed using trebuchet apparently
[08:55:05] <hashar>	 so would probably need to figure out how to add the new jobrunner instance in trebuchet config
[08:55:13] <hashar>	 hopefully that is done entirely from hiera
[08:55:45] <hashar>	 then fight with Trebuchet.  I tried a deploy of jobrunner service this week, and Trebuchet was not reporting the deploy as completed for some reason. I eventually gave up and did the update manually
[08:56:05] <elukey>	 mmm ok so let's not do it on a Friday 
[08:56:17] <elukey>	 don't want to cause major headaches before the weekend, this work is not that urgent
[08:56:20] <elukey>	 will restart on Monday
[08:56:41] <elukey>	 maybe I can ping the Labs folks this evening to get the quota reviewed
[09:12:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[09:16:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[09:25:35] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643069 (10hashar) @addshore wrote  https...
[09:29:10] <addshore>	 hashar: oooh, nice little post there!
[09:30:57] <hashar>	 addshore: that is probably non sense
[09:31:22] <addshore>	 it sounds believable :P but you were just talking about growing tomatoes...
[09:31:45] <hashar>	 hehehe
[09:31:49] <hashar>	 I have asked ops
[09:33:32] <addshore>	 [=
[09:34:36] <hashar>	 REPRODUCED !!!!!!!!!!!!!!!!!
[09:34:39] <hashar>	 I am such a ahcker
[09:34:47] <hashar>	 /usr/bin/timeout: the monitored command dumped core
[09:35:45] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643086 (10hashar) Reproduction on terbiu...
[09:37:12] <hashar>	 and I cant access the core files obviously :(
[09:37:29] <hashar>	 tis write only
[09:37:38] <hashar>	 I am happy anyway
[09:39:47] <hashar>	 addshore: heading school brb
[09:39:56] <addshore>	 *waves*
[09:50:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#2643107 (10Paladox)
[09:51:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[09:52:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#546902 (10Paladox) @hashar hi, would this https://gerrit.wikimedia.org/r/#/c/311032/ look ok? Since I doint thin you can filter refs on pa...
[09:52:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:58:39] <hashar>	 back
[10:08:14] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10zeljkofilipin) >>! In T145819#...
[10:14:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121704: 04FAILURE in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121704/
[10:15:26] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643170 (10hashar) And strace is again my...
[10:16:42] <shinken-wm>	 RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:16:47] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643173 (10hashar) TLDR: when running mws...
[10:17:59] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07HHVM, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643176 (10hashar)
[10:22:45] <shinken-wm>	 PROBLEM - Puppet run on mira02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[10:24:33] <wmf-insecte>	 Project beta-code-update-eqiad build #121705: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121705/
[10:25:35] <Reedy>	 10:23:02 INFO:mwextupdate:running: git submodule update --init --recursive
[10:25:35] <Reedy>	 10:23:44 No submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue'
[10:25:35] <Reedy>	 10:24:30 Failed to recurse into submodule path 'FundraisingEmailUnsubscribe'
[10:25:54] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643211 (10zeljkofilipin)
[10:27:09] <wikibugs>	 10Browser-Tests-Infrastructure, 10MobileFrontend, 10QuickSurveys, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin)
[10:29:40] <wikibugs>	 10Browser-Tests-Infrastructure, 10MobileFrontend, 10QuickSurveys, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643221 (10zeljkofilipin)
[10:30:45] <hashar>	 zeljkof: for that task, maybe look in logstash ?
[10:31:11] <zeljkof>	 hashar: stumbled upon that while in the middle of something else :|
[10:31:17] <zeljkof>	 https://phabricator.wikimedia.org/T145799
[10:31:34] <zeljkof>	 so no time at the moment for another investigation :)
[10:33:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[10:34:31] <wmf-insecte>	 Project beta-code-update-eqiad build #121706: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121706/
[10:42:57] <Reedy>	 https://gerrit.wikimedia.org/r/311109
[10:43:03] <Reedy>	 That should be the fix for broken code update
[10:43:08] <Reedy>	 It just feels seriously icky
[10:43:15] <Reedy>	 git submodule... for something brought in via composer?
[10:43:18] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07HHVM, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643247 (10hashar)
[10:43:31] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10hashar)
[10:44:31] <wmf-insecte>	 Project beta-code-update-eqiad build #121707: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121707/
[10:49:23] <Reedy>	 hashar: Think we should just merge https://gerrit.wikimedia.org/r/311109 ?
[10:52:34] <hashar>	 Reedy: I have no idea. Ask fundraising I guess :]
[10:52:42] <hashar>	 Reedy: note we have job that do install from composer
[10:52:55] <hashar>	 though in this case they seem to include all the deps they need via a composer.json / composer.lock
[10:53:01] <hashar>	 so most probably, we dont want a submodule
[10:53:06] <hashar>	 but just  composer update  or whatever
[10:53:28] <hashar>	 Reedy: and their composer.json has  https://gerrit.wikimedia.org/r/p/wikimedia/fundraising/php-queue.git
[10:53:32] <hashar>	 so they use a fork
[10:53:48] <hashar>	 at commit 14198ba1f7d4868933649a85621a3955965e83cd
[10:54:41] <wmf-insecte>	 Project beta-code-update-eqiad build #121708: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121708/
[10:56:13] <Reedy>	 hashar: indeed
[10:56:24] <Reedy>	 As to why they're just committing the folder like htat...
[10:56:26] <Reedy>	 Is another issue
[10:56:30] <Reedy>	 Multiple versions of monolog etc
[10:56:52] <Reedy>	 hashar: composer update doesn't do anything
[10:58:12] <Reedy>	 https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/tree/master/vendor/coderkungfu
[10:58:25] <Reedy>	 That shows it being committed as a git submodule
[10:59:28] <Reedy>	 https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/commit/3d14421a8300cfb7185752e974750f12baad96c3#diff-18108d068d013f322534c3d4945868dcR1
[10:59:33] <Reedy>	 Subproject commit 14198ba1f7d4868933649a85621a3955965e83cd
[11:00:27] <Reedy>	 I presume, it's due to the way it was setup... Someone cloned it first, and then just committed the files in place
[11:02:15] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643283 (10hashar) `SiteConfigu...
[11:04:12] <hashar>	 Reedy: I think I remember a bug about it
[11:04:25] <hashar>	 with composer update not being able to figure out a new version is available when a sha1 is used
[11:04:34] <wmf-insecte>	 Project beta-code-update-eqiad build #121709: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121709/
[11:04:40] <Reedy>	 If you recreate the commit, ontop of HEAD~1
[11:04:41] <Reedy>	 it works fine
[11:07:37] * Reedy reverts the commit and remakes it
[11:13:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:14:33] <wmf-insecte>	 Project beta-code-update-eqiad build #121710: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121710/
[11:24:35] <wmf-insecte>	 Project beta-code-update-eqiad build #121711: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121711/
[11:34:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121712: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121712/
[11:36:29] <hashar>	 !log apt-get upgrade on deployment-tin , bring in a new hhvm version and others
[11:36:32] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:36:53] <hashar>	 Reedy: revert your code please
[11:37:02] <hashar>	 Reedy: beta scap breaks with: 
[11:37:03] <hashar>	 00:00:03.105 INFO:mwextupdate:running: git submodule update --init --recursive
[11:37:03] <hashar>	 00:00:45.192 No submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue'
[11:37:03] <hashar>	 00:01:28.019 Submodule path 'Wikidata': checked out 'f99c17a841aac26b56465413696c6ea991c1a222'
[11:37:03] <hashar>	 00:01:30.460 Failed to recurse into submodule path 'FundraisingEmailUnsubscribe'
[11:37:11] <Reedy>	 hashar: Sorry?
[11:37:15] <Reedy>	 I didn't merge anything
[11:37:19] <hashar>	 ohh
[11:37:19] <Reedy>	 It was already broken with that
[11:37:20] <hashar>	 funny
[11:37:21] <Reedy>	 I was trying to fix it
[11:37:33] <hashar>	 ah
[11:37:37] <hashar>	 the first build failling is https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121704/console
[11:37:41] <Reedy>	 I proposed 2 different fixes ;)
[11:37:53] <hashar>	 00:00:02.973 From https://gerrit.wikimedia.org/r/p/mediawiki/extensions/FundraisingEmailUnsubscribe
[11:37:53] <hashar>	 00:00:02.973    d1062e8..3d14421  master     -> origin/master
[11:41:54] <Reedy>	 hashar: yeah, so hence one commit adding the .gitmodule which feels icky
[11:42:04] <Reedy>	 and a second commit, reverting that one, and redoing it using only composer
[11:42:51] <hashar>	 !log beta: apt-get upgrade on deployment-jobrunner01
[11:42:55] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:44:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121713: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121713/
[11:45:43] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643378 (10hashar) On the #beta...
[11:50:44] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643380 (10hashar) I am trying...
[11:54:34] <wmf-insecte>	 Project beta-code-update-eqiad build #121714: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121714/
[12:04:34] <wmf-insecte>	 Project beta-code-update-eqiad build #121715: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121715/
[12:06:19] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643404 (10zeljkofilipin)
[12:07:58] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643405 (10hashar) Running:  su...
[12:13:03] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643423 (10hashar) On terbium c...
[12:14:30] <wmf-insecte>	 Project beta-code-update-eqiad build #121716: 04STILL FAILING in 1 min 29 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121716/
[12:24:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121717: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121717/
[12:24:49] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643458 (10zeljkofilipin)
[12:26:59] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:30:44] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643463 (10hashar) Got a `Faile...
[12:34:33] <wmf-insecte>	 Project beta-code-update-eqiad build #121718: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121718/
[12:44:31] <wmf-insecte>	 Project beta-code-update-eqiad build #121719: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121719/
[12:54:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121720: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121720/
[13:01:58] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:03:49] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643512 (10zeljkofilipin)
[13:04:05] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10hashar) From T145839...
[13:04:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121721: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121721/
[13:07:04] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643518 (10zeljkofilipin)
[13:09:47] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643524 (10zeljkofilipin)
[13:14:29] <wmf-insecte>	 Project beta-code-update-eqiad build #121722: 04STILL FAILING in 1 min 29 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121722/
[13:21:13] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643533 (10zeljkofilipin) Not reproducible on local media...
[13:21:35] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643535 (10hashar)
[13:24:34] <wmf-insecte>	 Project beta-code-update-eqiad build #121723: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121723/
[13:25:40] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643545 (10zeljkofilipin)
[13:26:25] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin) Also reproducible targeting pro...
[13:26:49] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2643551 (10hashar) Rollbacked due to account creation being broken T145839  Investigation is on T145819.
[13:29:03] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643555 (10zeljkofilipin) Not reproducible in integration...
[13:34:31] <wmf-insecte>	 Project beta-code-update-eqiad build #121724: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121724/
[13:44:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121725: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121725/
[13:48:54] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643609 (10zeljkofilipin) API seems to work fine on beta...
[13:52:27] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643622 (10zeljkofilipin) Verbose stack trace:  MobileFro...
[13:54:29] <wmf-insecte>	 Project beta-code-update-eqiad build #121726: 04STILL FAILING in 1 min 28 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121726/
[14:02:44] <shinken-wm>	 RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:04:35] <wmf-insecte>	 Project beta-code-update-eqiad build #121727: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121727/
[14:14:35] <wmf-insecte>	 Project beta-code-update-eqiad build #121728: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121728/
[14:24:34] <wmf-insecte>	 Project beta-code-update-eqiad build #121729: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121729/
[14:29:39] <thcipriani|afk>	 ^ should be fixed
[14:30:47] <elukey>	 this is from gallium
[14:30:48] <elukey>	 PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php5/20090626/mysql.so' - /usr/lib/php5/20090626/mysql.so: cannot open shared object file: No such file or directory in Unknown on line 0
[14:30:55] <elukey>	 (root@ cron emails)
[14:31:13] <elukey>	 anything known?
[14:31:18] <elukey>	 it appeared today afaics
[14:31:22] <thcipriani|afk>	 eh, not to me...
[14:31:30] <thcipriani|afk>	 ^ hashar have you seen?
[14:31:36] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2643749 (10MoritzMuehlenhoff) A few jessie-related changes have been sorted out, mira02.deployment-prep.eqiad.wmflabs should be ready for testing.
[14:34:32] <wmf-insecte>	 Yippee, build fixed!
[14:34:32] <wmf-insecte>	 Project beta-code-update-eqiad build #121730: 09FIXED in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121730/
[14:35:03] <thcipriani|afk>	 hrm, I don't have permission to view the crontab for root on gallium.
[14:39:56] <thcipriani|afk>	 we have php5-mysql which is what supplies mysql.so afaik. Not sure why it would disappear.
[14:40:15] <thcipriani|afk>	 I'm actually going afk to finish up morning things. back soon.
[14:42:41] <thcipriani|afk>	 yeah, should be where that comes from: http://packages.ubuntu.com/precise/amd64/php5-mysql/filelist and we have pdo_mysql.so but not mysql or mysqli.so...
[14:45:20] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643760 (10hashar) I am enlargi...
[14:49:38] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2643768 (10hashar)
[14:50:06] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643770 (10hashar)
[14:53:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[14:55:23] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2643777
[14:56:35] <wmf-insecte>	 Yippee, build fixed!
[14:56:36] <wmf-insecte>	 Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #156: 09FIXED in 3 min 51 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/156/
[14:58:03] <hashar>	 elukey: thcipriani|afk oh my god
[14:58:07] <hashar>	 I have removed mysql from gallium
[14:58:23] <hashar>	 13:35	<hashar>	gallium: removing MySQL which is no more defined in puppet and running puppet. Did: apt-get remove mysql-common mysql-server mysql-server-core-5.5
[14:59:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:00:38] <hashar>	 did dpkg --purge php5-mysql
[15:01:25] <hashar>	 fixed
[15:01:32] <hashar>	 thcipriani|afk: we are back to wmf.18
[15:01:46] <hashar>	 the wikidata job issue has its root cause deep in mw/hhvm etc
[15:01:53] <hashar>	 and also prevented account creation entirely
[15:01:56] <hashar>	 so I have just rollbacked
[15:01:59] <hashar>	 incident report at https://wikitech.wikimedia.org/wiki/Incident_documentation/20160915-MediaWiki
[15:02:11] <hashar>	 main task is https://phabricator.wikimedia.org/T145819
[15:02:19] <hashar>	 with all the crazy details from yesterday and this morning debugging
[15:02:37] <hashar>	 TL;DR: have to purge the HHVM bytecode cache everywhere
[15:02:42] <hashar>	 then I guess we will be able to push wmf.19
[15:02:53] <hashar>	 more longterm: fix https://phabricator.wikimedia.org/T111441 :]
[15:03:03] <hashar>	 and garbage collect hhvm bytecode
[15:03:16] <hashar>	 anyway was just brain dumping
[15:03:43] <thcipriani>	 wowza
[15:05:39] <thcipriani>	 damn
[15:05:44] <thcipriani>	 that's some solid debugging
[15:09:25] <hashar>	 yeah 
[15:09:35] <hashar>	 moritzm: I am off.  Thank you for mira02 :]
[15:09:54] <moritzm>	 have a nice weekend!
[15:12:16] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 4 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643803 (10jhobs) Thanks for the fast turnaround @zeljkof...
[15:15:53] <hashar>	 thcipriani: yeah
[15:16:01] <hashar>	 thcipriani: well we can talk about it next monday :]
[15:16:09] <thcipriani>	 hashar: kk, sounds good.
[15:16:15] <hashar>	 thcipriani: I am out myself.  Will probably not show up this evening
[15:16:19] <hashar>	 I feel pretty tired :]
[15:16:31] <thcipriani>	 have a good weekend :)
[15:16:34] <wmf-insecte>	 Yippee, build fixed!
[15:16:34] <wmf-insecte>	 Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #159: 09FIXED in 17 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/159/
[15:16:41] <hashar>	 danke
[15:20:43] <paladox>	 hasharAway hi, following from the gerrit commit that you abandon sorry that you thought that i was guessing but im not.
[15:20:46] <paladox>	 It actually works
[15:20:55] <paladox>	 been doing that on gerrit-test and seems to work 
[15:21:13] <paladox>	 Since it only excluded refs/meta/config refs
[15:23:09] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2643810
[15:23:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10puppet-compiler, 13Patch-For-Review: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2643813 (10hashar) a:03fgiunchedi Looks like @fgiunchedi solved it :)
[15:24:10] <hasharAway>	 paladox: feel free to reopen but write a good description in the commit summary :]
[15:24:26] <paladox>	 Ok
[15:24:27] <hasharAway>	 paladox: also switch to the  group that contains the jenkins bot since we can rename that user later on
[15:24:27] <paladox>	 yeh
[15:24:28] <hasharAway>	 :]
[15:24:32] <hasharAway>	 I am off, good week-end!
[15:24:43] <paladox>	 Ok, i just didnt want it to block you hasharAway
[15:24:49] <paladox>	 since your included in that group
[15:27:15] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[15:30:04] <paladox>	 moritzm Hi, yay looks like grrrit-wm is lasting longer again with my under the hood changes, I did a work around to use npm 2 and it seems to work.
[15:33:45] <shinken-wm>	 RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:34:02] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[15:37:50] <shinken-wm>	 RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:43:22] <wmf-insecte>	 Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #160: 04FAILURE in 18 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/160/
[15:51:17] <wmf-insecte>	 Yippee, build fixed!
[15:51:17] <wmf-insecte>	 Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #160: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/160/
[16:02:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[16:06:25] <godog>	 re: deployment-prep quota increase, anything we can do to speed things up? mostly T145611 and T145636
[16:07:18] <greg-g>	 (I want stashbot back giving me links)
[16:07:31] <godog>	 stashbot: :(
[16:07:31] * greg-g is in a meeting will catch up in about 23 minutes
[16:07:43] <godog>	 indeed, I was expecting the links too
[16:11:09] <elukey>	 +1 for the request, it is blocking a bit the deployment-prep debian migration
[16:13:05] <greg-g>	 short quip: talk to chase ;)
[16:13:14] <greg-g>	 but, I'll review the state soon
[16:16:46] <godog>	 makes sense, I'll talk to chase, thanks greg-g !
[16:28:03] <greg-g>	 godog: do it in a public (or at least _security) channel, plz, so I can follow :)
[16:29:11] <godog>	 greg-g: *nod* I don't see chase online, will do tho
[16:30:25] <greg-g>	 word
[17:21:13] <wikibugs>	 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644059 (10Ladsgroup)
[17:25:08] <wikibugs>	 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644074 (10Paladox) Testing with gerrit 2.12.4 seems to stop the 500 error but it replaces it with ??
[17:27:40] <wikibugs>	 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644084 (10Paladox) Filled upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=4570
[17:27:49] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644085 (10Paladox)
[17:32:36] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644115 (10Paladox) It looks like it might be because we set utf-8 on MySQL.  Since testing here https://gerrit-review.googlesource.com/#/c/86192/ shows it works.  Also testing on a local gerrit 2.13 results in the s...
[17:38:56] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644142 (10hashar)
[17:48:33] <wikibugs>	 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#2644157 (10thcipriani) p:05Triage>03Normal I have the same concern for git-fat repos as well.
[17:48:46] <wikibugs>	 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP), 13Patch-For-Review: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2644160 (10mmodell)
[17:49:11] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644059 (10hashar) Pasted the server side stacktrace. Looks like com.google.gwtorm.jdbc / com.google.gwtorm.schema.sql is not fully Unicode aware :]  We have for `database.url`  jdbc:mysql://<%= @db_host %>/<%= @db_n...
[17:49:53] <wikibugs>	 10Deployment-Systems, 03Scap3: Scap subcommand bash completion - https://phabricator.wikimedia.org/T135317#2644167 (10thcipriani) 05Open>03Resolved a:03thcipriani There is currently bash completion for subcommands, flags, files, and directories live as of scap v.3.2.1-1.
[17:50:59] <wikibugs>	 03Scap3, 10scap: scap to reload a service instead of restart - https://phabricator.wikimedia.org/T134001#2644175 (10thcipriani) p:05Triage>03Normal
[17:53:42] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644186 (10Paladox) @hashar thanks.
[17:56:44] <grrrit-wm>	 (03CR) 10Florianschmidtwelzow: "needs rebase." [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[17:56:59] <grrrit-wm>	 (03PS2) 10Paladox: [CookieWarning] Add Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/310561 
[17:57:30] <grrrit-wm>	 (03CR) 10Paladox: "@Florianschmidtwelzow hi it doesn't, it's do to the setting we have for this repo in gerrit, we choice fast forward like it is for puppet." [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[17:57:48] <wikibugs>	 03Scap3: Make symlink-swapping optional in deploy promote - https://phabricator.wikimedia.org/T145889#2644204 (10mmodell)
[17:58:38] <wikibugs>	 03Scap3: Scap3 config references to deployed directory - https://phabricator.wikimedia.org/T145437#2644218 (10mmodell) I've made a task for {icon arrow-up} that.  {T145889}
[18:08:26] <wikibugs>	 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644229 (10Paladox) I also filled https://bugs.chromium.org/p/gerrit/issues/detail?id=4571 upstream due to it being a 500 error.
[18:13:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[18:45:13] <ottomata>	 hiyaaa, if i want to send logs to logstash in deployment-prep, what host/port (s) do I use?
[18:53:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:55:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:05:49] <thcipriani>	 ottomata: heya, little late to reply, but FWIW, the host we use for scap to check logstash in beta is logstash2.deployment-prep.eqiad.wmflabs. The public url is https://logstash-beta.wmflabs.org
[19:08:19] <ottomata>	 great, so if i configure a thing to use that port 12201, shoudl work?
[19:09:17] <ottomata>	 ah ya, totally works! :)
[19:10:31] <thcipriani>	 cool. Was just checking on that, might need some holes punched from one server to another, but if it works, guess not :)
[19:15:49] <wmf-insecte>	 Yippee, build fixed!
[19:15:50] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #147: 09FIXED in 47 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/147/
[19:16:16] <wmf-insecte>	 Yippee, build fixed!
[19:16:16] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #147: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/147/
[19:19:54] <ejegg>	 Hi releng! I'm trying to add tests to FundraisingEmailUnsubscribe, and am still not sure where to require the composer autoload to get it picked up by tests.
[19:20:12] <ejegg>	 I added a hooks class with a static callback referenced in extension.json
[19:20:44] <ejegg>	 and that totally works when I run the tests locally in a wiki with wfLoadExtension in its LocalSettings
[19:21:14] <ejegg>	 but CI doesn't seem to load the extension the same way. Do I need to add FundraisingEmailUnsubscribe to some list?
[19:25:12] <thcipriani>	 ejegg: hrm. I think there may be different variant for extension tests that use composer.
[19:25:17] <thcipriani>	 I'm not 100%, honestly.
[19:26:09] <thcipriani>	 legoktm: would be able to answer this question better than I could I think.
[19:26:12] <ejegg>	 thcipriani: ah, in that repo we're actually checking in the vendor dir, since there's not a lot more to the extension
[19:26:23] <ejegg>	 so we don't even need to run composer
[19:26:30] <ejegg>	 just need to hit that require_once line
[19:26:56] <ejegg>	 thcipriani: do you know if the extensions test suite calls wfLoadExtension somewhere? Seems like it would need to
[19:27:13] <ejegg>	 but it doesn't look like it's getting to the callback
[19:29:58] <Reedy>	 gerrit on a go slow today?
[19:30:11] <Reedy>	 ejegg: if it's extension.json...
[19:30:15] <Reedy>	 You don't need to add any callback
[19:30:17] <wmf-insecte>	 Yippee, build fixed!
[19:30:18] <wmf-insecte>	 Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #161: 09FIXED in 16 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/161/
[19:30:20] <Reedy>	 We have auto discovery now
[19:30:32] <Reedy>	 just needs to be in tests/phpunit (if phpunit tests etc)
[19:30:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:31:43] <Reedy>	 ejegg: https://phabricator.wikimedia.org/T142120 and https://phabricator.wikimedia.org/T142121
[19:31:49] <Reedy>	 https://gerrit.wikimedia.org/r/#/c/302944/
[19:32:27] <ejegg>	 Reedy: yep, it's finding the tests just fine, but I'm getting an undefined class error for some stuff that we've checked in under a vendor dir
[19:32:38] <Reedy>	 Something not in the autoloader?
[19:32:46] <Reedy>	 oh
[19:32:46] <ejegg>	 The callback is just to require_once the composer autoloader
[19:32:54] <Reedy>	 No, you don't need that
[19:33:01] <ejegg>	 https://gerrit.wikimedia.org/r/311174/
[19:33:19] <Reedy>	 ejegg: You didn't update https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/blob/master/extension.json !
[19:33:37] <Reedy>	 ejegg: you need to add "load_composer_autoloader": true,
[19:33:54] <ejegg>	 aha, thanks!
[19:35:05] <Reedy>	 ejegg: Have you done anything to fix the git submodule errors yet?
[19:35:42] <ejegg>	 Reedy: working locally now
[19:35:48] <Reedy>	 :)
[19:36:10] <ejegg>	 Reedy: submodule errors for FundrasingEmailUnsubscribe?
[19:36:24] <Reedy>	 You were breaking beta code update
[19:36:32] <thcipriani>	 I fixed those this morning
[19:36:36] <Reedy>	 ah, how?
[19:36:41] <ejegg>	 ooh, I didn't realize! thanks thcipriani 
[19:37:09] <thcipriani>	 I did: git rm --cached [the path it was looking for...kungfu-something]
[19:37:10] <Reedy>	 thcipriani: I see no commit to the extension... which suggests it'll still be broken locally?
[19:37:13] <thcipriani>	 and then a git submodule update
[19:37:52] <Reedy>	 reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git pull
[19:37:52] <Reedy>	 Already up-to-date.
[19:37:52] <Reedy>	 reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git s^C
[19:37:52] <Reedy>	 reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git submodule update --init --recursive
[19:37:52] <Reedy>	 fatal: no submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue'
[19:38:14] <ejegg>	 oh dang, I added dev-master dependency and forgot to delete .git before checking in the files
[19:38:15] <wmf-insecte>	 Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #161: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/161/
[19:38:18] <thcipriani>	 ah, dang, I just figured there was something wrong with beta code update.
[19:38:22] <ejegg>	 ok, I'll fix that
[19:38:26] <ejegg>	 sorry folks!
[19:38:35] <Reedy>	 ejegg: I've 2 seperate fixes available
[19:38:40] <Reedy>	 https://gerrit.wikimedia.org/r/311109 add .gitmodule
[19:38:51] <Reedy>	 https://gerrit.wikimedia.org/r/311111 revert and redo https://gerrit.wikimedia.org/r/311112
[19:38:54] <ejegg>	 nah, we don't actually want it as a submodule
[19:39:07] <Reedy>	 heh
[19:39:14] <Reedy>	 so the second two reinstate it via composer alone
[19:39:58] <Reedy>	 tbh, those 2 should probably just be squashed
[19:40:07] <Reedy>	 which should be worthwhile for a fix
[19:40:31] <ejegg>	 hmm, that re-do doesn't check in the phpqueue classes. Let me try something
[19:40:59] <Reedy>	 Yup
[19:41:00] <Reedy>	 moment
[19:41:44] <Reedy>	 ah, screw it
[19:41:49] <Reedy>	 I'll abandon mine
[19:42:32] <Reedy>	 ejegg: please merge https://gerrit.wikimedia.org/r/#/c/311110/ though :P
[19:42:49] <ejegg>	 ah word, will do!
[19:46:31] <Reedy>	 I'll let you fix the "submodule" issue though
[19:46:46] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2644400
[19:50:13] <wikibugs>	 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 4 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2644405 (10Jdlrobson) 05Open>03Resolved Thanks for ra...
[19:50:43] <hashar>	 whole puppet runs are broken on beta arent they ?
[19:50:58] <Krenair>	 hm?
[19:51:04] <hashar>	 http://shinken.wmflabs.org/problems?search=deployment
[19:51:26] <Krenair>	 wow
[19:51:45] <hashar>	 maybe it is just shinken being mad
[19:52:17] <Krenair>	 Ah
[19:52:26] <Krenair>	 It's our old friend the double-puppet.conf bug
[19:53:41] <hashar>	 !log beta created instance "deployment-parsoid05" Should be deleted later, that is merely to purge the hostname from Shinken ( http://shinken.wmflabs.org/host/deployment-parsoid05 )
[19:53:47] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[19:54:12] <Krenair>	 you sure that'll work hashar?
[19:54:18] <hashar>	 yeah 
[19:54:21] <hashar>	 gotta wait
[19:54:30] <hashar>	 then eventually delete the instance and magically it get purged
[19:54:41] <hashar>	 maybe it hasn't been properly deleted in ldap or whatever source
[19:54:48] <hashar>	 so on beta
[19:54:51] <hashar>	 on a random instance
[19:54:57] <hashar>	 Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key.
[19:54:57] <hashar>	 Certificate fingerprint: 75:17:D2:C5:D1:E6:05:66:DD:72:45:84:BC:E8:AA:9A:D0:0A:88:A4:B3:7D:4D:A3:BC:C5:A0:2B:18:01:16:92
[19:54:57] <hashar>	 To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficat
[19:55:05] <hashar>	 there is exactly ZERO WAY I am going to fix all the instances
[19:55:13] <Reedy>	 ejegg|meet: Yeah, I think it'll be easiest for you to fix it
[19:55:26] <Krenair>	 I'm looking into it
[19:59:29] <Krenair>	 krenair@deployment-puppetmaster:~$ sudo -i puppet cert list --all | grep ms-be02
[19:59:29] <Krenair>	 + "deployment-ms-be02.deployment-prep.eqiad.wmflabs"         (SHA256) 3F:22:48:A3:37:37:D9:B6:48:FF:2B:67:EB:69:1F:F8:D1:2E:6F:59:02:CC:40:06:8C:15:0C:F2:3B:61:09:EB
[19:59:37] <Krenair>	 root@deployment-ms-be02:~# openssl x509 -in /var/lib/puppet/client/ssl/certs/deployment-ms-be02.deployment-prep.eqiad.wmflabs.pem -noout -fingerprint -sha256
[19:59:37] <Krenair>	 SHA256 Fingerprint=3F:22:48:A3:37:37:D9:B6:48:FF:2B:67:EB:69:1F:F8:D1:2E:6F:59:02:CC:40:06:8C:15:0C:F2:3B:61:09:EB
[20:00:03] <ejegg|meet>	 Reedy: I think this should do it: https://gerrit.wikimedia.org/r/311176 
[20:00:18] <Reedy>	 let me test :)
[20:00:52] <ejegg|meet>	 Tests are now passing! https://gerrit.wikimedia.org/r/311174 
[20:00:57] <Reedy>	 yup
[20:00:59] <hashar>	 Krenair: ops is heavily refactor puppetmaster
[20:00:59] <Reedy>	 WFM
[20:01:08] <Reedy>	 +2'd
[20:01:32] <Krenair>	 hashar, I'm not yet blaming anything on ops
[20:01:36] <Reedy>	 for the submodule, that is
[20:02:04] <ejegg|meet>	 thanks Reedy !
[20:02:19] <ejegg|meet>	 That was totally the issue with the tests too, just missing those files
[20:02:20] <hashar>	 Krenair: last time it occured, I have just deleted everything :]
[20:02:35] <ejegg|meet>	 but, I'm happy to have learned about load_composer_autoloader!
[20:02:51] <Krenair>	 hashar, cleaned all certs on the puppetmaster?
[20:03:46] <hashar>	 yeah 
[20:03:58] <hashar>	 and deleted all /var/lib/puppet/client/ssl  dirs on all instances via salt
[20:04:03] <hashar>	 maybe the ca has changed
[20:05:36] <hashar>	 Krenair: I am just going to do that (delete everything
[20:07:11] <hashar>	 !log beta: stopping puppetmaster,  rm -f /var/lib/puppet/server/ssl/ca/signed/*
[20:07:17] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:07:53] <hashar>	 !log beta: salt -v '*' cmd.run 'rm -fR /var/lib/puppet/client/ssl/'
[20:07:59] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:11:10] <hashar>	 Krenair: solved :)
[20:12:18] <shinken-wm>	 PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:12:43] <hashar>	 ah
[20:12:52] <hashar>	 so   salt -v '*' cmd.run 'puppet agent -tv'  
[20:13:03] <hashar>	 that is definitely a good way to kill the puppetmaster :D
[20:13:50] <hashar>	 !log beta: restarted puppetmaster 
[20:13:56] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:13:56] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:14:02] <shinken-wm>	 PROBLEM - Puppet run on deployment-sentry01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:14:04] <shinken-wm>	 PROBLEM - Puppet run on deployment-conf03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:10] <shinken-wm>	 PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:20] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:34] <shinken-wm>	 PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:40] <shinken-wm>	 PROBLEM - Puppet run on deployment-poolcounter02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:46] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:50] <shinken-wm>	 PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:14:51] <shinken-wm>	 PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:14:59] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:15:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:15:21] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:16:07] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:16:13] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:16:39] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:16:39] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:16:53] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:16:55] <shinken-wm>	 PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:17:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:17:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:17:09] <shinken-wm>	 PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:17:11] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdf02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:19] <shinken-wm>	 PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:23] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:17:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:17:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:38] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:46] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:17:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:17:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:17:52] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:17:56] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:18:00] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:18:12] <shinken-wm>	 PROBLEM - Puppet run on deployment-ircd is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:14] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:14] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:18] <shinken-wm>	 PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:18:20] <shinken-wm>	 PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:24] <shinken-wm>	 PROBLEM - Puppet run on deployment-conftool is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:26] <greg-g>	 oh hi
[20:18:26] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:18:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-jobrunner01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:33] <shinken-wm>	 PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:18:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:18:45] <shinken-wm>	 PROBLEM - Puppet run on mira02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:48] <hashar>	 greg-g: that is all me
[20:18:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:18:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:51] <shinken-wm>	 PROBLEM - Puppet run on deployment-logstash2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:18:59] <hashar>	 it is actually runing just fine
[20:19:08] <greg-g>	 heh
[20:19:26] <greg-g>	 I'll take my car to the mechanic (again) and check in later, probably 30 minutes from now
[20:19:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:19:29] <shinken-wm>	 PROBLEM - Puppet run on mira is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:19:36] <hashar>	 greg-g: I will be gone :)
[20:19:40] <greg-g>	 g'night!
[20:19:41] <hashar>	 greg-g: have a safe trip_
[20:19:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:24:07] <shinken-wm>	 RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:24:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:24:40] <shinken-wm>	 RECOVERY - Puppet run on deployment-poolcounter02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:25:02] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#2644460 (10Paladox) Upstream in openstack actually only allow project owners to read refs/meta/config (https://review.openstack.org/#/admi...
[20:26:06] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:26:12] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:26:39] <hashar>	 Krenair: yeah so I have eventually overloaded the puppetmaster via a salt command and had to kill the puppetmaster 
[20:26:47] <Krenair>	 :)
[20:26:51] <hashar>	 hence the spam of alarms there.  But all hosts are recovered as far as I can tell
[20:27:09] <hashar>	 !log beta:  force running puppet in batches of 4 instances:  salt --batch 4 -v 'deployment-*' cmd.run 'puppet agent -tv'
[20:27:10] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:12] <shinken-wm>	 RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:16] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:27:16] <shinken-wm>	 RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:21] <hashar>	 it is not very smart
[20:27:27] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:31] <hashar>	 pupetmaster apparently tries to compile as many catalogs it can
[20:27:44] <hashar>	 until the host goes out of memory
[20:27:49] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:51] <shinken-wm>	 RECOVERY - Puppet run on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdfrender is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:25] <shinken-wm>	 RECOVERY - Puppet run on deployment-conftool is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:49] <shinken-wm>	 RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:14] <hashar>	 http://prometheus.wmflabs.org/alerts is going to be wayyy better
[20:29:19] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:31] <shinken-wm>	 RECOVERY - Puppet run on mira is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[20:30:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:30:22] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:31:38] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:31:38] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:31:52] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:32:02] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:32:11] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 05Prometheus-metrics-monitoring: deploy prometheus node_exporter and server to deployment-prep - https://phabricator.wikimedia.org/T144502#2601885 (10hashar) That works [[ https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=839239&oldid=839...
[20:32:30] <paladox>	 https://meta.wikimedia.org/wiki/Merchandise_giveaways/Nominations#Paladox :)
[20:32:34] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:32:44] <hashar>	 ahah
[20:32:46] <hashar>	 I am going to approve it
[20:32:52] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:32:58] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:16] <shinken-wm>	 RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:20] <shinken-wm>	 RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:45] <shinken-wm>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:45] <shinken-wm>	 RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:51] <shinken-wm>	 RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:33:53] <hashar>	 paladox: anything else you want from https://store.wikimedia.org/ ? 
[20:33:57] <hashar>	 please ask you are my guest
[20:34:01] <shinken-wm>	 RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:34:07] <paladox>	 LOL nope
[20:34:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:37:34] <paladox>	 Thomas Mulhall (paladox)
[20:37:34] <paladox>	  12810
[20:37:34] <paladox>	 Independent
[20:37:49] <hashar>	 sounds like openstack stackalytics isn't it?
[20:38:03] <paladox>	 hashar it's from http://korma.wmflabs.org/browser/scr-contributors.html
[20:38:09] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[20:38:13] <shinken-wm>	 RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[20:38:18] <paladox>	 but i have no idea where it gets thomas mulhall from, it's partially right
[20:38:23] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:38:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:42:02] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:42:20] <shinken-wm>	 RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:42:32] <shinken-wm>	 RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:42:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[20:42:48] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:43:01] <paladox>	 I havent set my name any where though except from personal email address
[20:43:27] <Reedy>	 gerrit?
[20:43:34] <paladox>	 Nope
[20:43:53] <Reedy>	 yes
[20:43:53] <Reedy>	 Author	
[20:43:54] <Reedy>	 Paladox <thomasmulhall410@yahoo.com>
[20:43:55] <paladox>	 My real name is hidden but still it manages to get leaked, but lol, it only got it partially right, first name is wrong
[20:44:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:55] <paladox>	 ^^ But it seemed i managed to also leak my middle name but i have no idea how i managed that
[20:45:02] <paladox>	 since i never right my middle name online
[20:45:16] <Reedy>	 must've somewhere at some point
[20:46:12] <paladox>	 nope
[20:46:35] <Reedy>	 if it's being used, it's been posted somewhere
[20:46:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:47:23] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:47:56] <paladox>	 Oh, not sure woulden be facebook since i doint use my middle name but i also have it set so google carnt find me
[20:48:23] <hashar>	 Krenair: all clear :]  for now
[20:48:27] <Krenair>	 great
[20:50:55] <Krenair>	 actually, my inbox is less great :p
[20:51:11] <shinken-wm>	 PROBLEM - Free space - all mounts on mira02 is CRITICAL: CRITICAL: deployment-prep.mira02.diskspace._srv.byte_percentfree (<11.11%)
[20:52:05] <hashar>	 !log fixed puppet on deployment-parsoid05 . Temporary instance will delete it later to clear out shinken.wmflabs.org
[20:52:11] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:54:17] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:55:32] <greg-g>	 hashar: you're still here :)
[20:55:58] <hashar>	 ueah aze
[20:56:01] <hashar>	 for the good cause!
[20:56:30] <hashar>	 I have voted to get Paladox a tshirt https://meta.wikimedia.org/wiki/Merchandise_giveaways/Nominations#Paladox  :D
[20:56:40] <paladox>	 :)
[20:56:45] <paladox>	 thanks
[20:56:54] <Krenair>	 hashar, I don't think the -parsoid05 trick has worked
[20:57:04] <Krenair>	 ldap still contains the old broken data
[20:57:20] <Krenair>	 so shinkengen is still giving out the wrong ip and failing tests
[20:59:26] <hashar>	 ah :(
[20:59:45] <Krenair>	 when you created that instance, sink_nova_ldap probably tried to create a new entry in ldap
[20:59:54] <hashar>	 and fail due to a dupe ?
[20:59:55] <Krenair>	 if I'm right, it would've failed
[20:59:57] <Krenair>	 yes
[21:00:05] <hashar>	 deleting it :(
[21:00:10] <Krenair>	 ldap would see the existing one and deny it
[21:00:27] <hashar>	 !log deleted deployment-parsoid05 
[21:00:33] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:01:08] <shinken-wm>	 RECOVERY - Free space - all mounts on mira02 is OK: OK: All targets OK
[21:02:02] <Krenair>	 hashar, that may have done the trick though
[21:03:01] <hashar>	 Krenair: maybe on deletion it does not validate much
[21:03:28] <hashar>	 !log deployment-tin  did a git gc on /srv/deployment/ores   That freed up disk space and cleared an alarm on co master mira02
[21:03:34] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:03:36] <hashar>	 we will have to recreate mira02 with a larger disk :(
[21:03:44] <hashar>	 and get a new flavor
[21:03:53] <hashar>	 that is the thing I dont get with openstack
[21:04:17] <Krenair>	 https://phabricator.wikimedia.org/diffusion/GSNL/browse/master/nova_ldap/base.py;9ba4f8b4993416787772f41676ad07c88fa20527$177
[21:04:17] <hashar>	 why not let the enduser pick any (X CPU, Y RAM, Z Disk) 
[21:04:22] <hashar>	 instead of the imposed flavors
[21:05:51] <Krenair>	 yep that got rid of the nonsense from shinken
[21:08:50] <shinken-wm>	 RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:17:11] <hashar>	 Krenair: magic!!!!! :)
[21:17:19] <hashar>	 Krenair: thank you for the support and confirmation
[21:17:46] <hashar>	 Krenair: also I have dig in my log history / some task.  Godog is going to add prometheus to beta as soon as the last two precise instances we have are gone
[21:18:56] <hashar>	 as I undersand that system, you load it with a s*** ton of random metrics collected from everything you hve
[21:19:05] <hashar>	 then have some kind of a query language to define the alarms
[21:19:17] <grrrit-wm>	 (03PS2) 10Mattflaschen: Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) 
[21:19:25] <paladox>	 LOL, since 2013 ive contributed more then most people on gerrit.
[21:19:55] <hashar>	 paladox: you are already in the top 1000 contributors on GitHub
[21:20:04] <paladox>	 Oh
[21:20:06] <paladox>	 wow
[21:20:09] <paladox>	 DIdnt know that
[21:20:12] <hashar>	 kidding :]
[21:20:17] <paladox>	 oh lol
[21:20:18] <hashar>	 fix moare bugs!
[21:20:22] <paladox>	 yep
[21:21:17] <grrrit-wm>	 (03CR) 10Reedy: [C: 031] Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) (owner: 10Mattflaschen)
[21:21:27] <hashar>	 http://githut.info/ is a quite fun site
[21:21:52] <paladox>	 oh
[21:22:14] <hashar>	 as is http://ghv.artzub.com/#user=wikimedia
[21:22:29] <hashar>	 gives folk a public API and they start building nice visualizations
[21:22:41] <paladox>	 hashar 1,124 contributions in the last year 
[21:22:45] <paladox>	 thats me
[21:24:26] <paladox>	 hashar look at this http://korma.wmflabs.org/browser/irc.html guess who is number 2.
[21:27:48] <Krenair>	 hashar, so, like with labmon.eqiad.wmnet?
[21:34:13] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[21:36:23] <hashar>	 Krenair: I think prometheus is able to fetch data from labmon yeah
[21:36:37] <hashar>	 Krenair: and I guess from logstash/elasticsearch
[21:37:04] <hashar>	 so maybe you can do some query that combines both and alarm based on that
[21:39:06] <paladox>	 hashar I still need to continue with the project cleaning adding npm and composer, (1,000+)
[21:39:11] <paladox>	 lol
[21:40:23] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 13Patch-For-Review: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2644626 (10Tgr)
[23:03:59] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637873 (10greg) See also, which you can use for expanding the project plan: https://www.mediawiki.o...