[01:23:14] https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/76076/console [01:23:14] stderr: 'fatal: unable to access 'https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Thanks/': gnutls_handshake() failed: A TLS packet with unexpected length was received.' [01:35:29] Krinkle: hmm, weird [01:39:37] Krinkle: rebuilding the job didn't have that problem [01:39:42] I guess it was a fluke [02:27:30] PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [03:07:29] RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [04:19:31] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #143: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/143/ [05:23:39] 10Browser-Tests-Infrastructure, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2642754 (10zeljkofilipin) a:03zeljkofilipin [05:24:28] 10Browser-Tests-Infrastructure, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin) I vaguely remember seeing a similar error while testing... [05:26:16] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:34:55] PROBLEM - Puppet staleness on deployment-db03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [06:01:13] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [07:49:11] 10Deployment-Systems, 10MediaWiki-extensions-WikimediaMaintenance, 06Operations, 13Patch-For-Review: WikimediaMaintenance refreshMessageBlobs: wmf-config/wikitech.php requires non existing /etc/mediawiki/WikitechPrivateSettings.php - https://phabricator.wikimedia.org/T140889#2642949 (10elukey) [08:27:06] (03CR) 10Zfilipin: [C: 032] [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox) [08:28:07] (03Merged) 10jenkins-bot: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox) [08:31:50] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642970 (10hashar) So looks like the root... [08:50:05] hashar: o/ [08:50:49] should we wait to reimage the jobrunner since we'd need to terminate/re-create it ? [08:50:56] I checked and it is doing some stuff [08:51:09] but probably redis can keep the jobs for a while [08:51:22] it shouldn't take too much to replace deployment-jobrunner01.deployment-prep.eqiad.wmflabs [08:51:55] you can try :] [08:52:02] not sure whether we got any prod jobrunner moved to jessie [08:52:06] but that might just work [08:52:17] tmh01 / video scaling would be probably a bit more difficult [08:52:37] anyway, have to deal with aftermath of wmf.19 deploy from yesterday, so I am not sure how much cycles I will get today to assist [08:52:45] but ask as needed, will reply as I can :] [08:52:55] (03CR) 10Zfilipin: "The commit is deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox) [08:53:20] hashar: we definitely have jessie jobrunners in prod now, not videoscalers (Moritz is working on it) [08:54:37] elukey: yeah so it will probably be fine [08:54:48] the jobrunner service is in the repo mediawiki/services/jobrunner [08:54:53] that is deployed using trebuchet apparently [08:55:05] so would probably need to figure out how to add the new jobrunner instance in trebuchet config [08:55:13] hopefully that is done entirely from hiera [08:55:45] then fight with Trebuchet. I tried a deploy of jobrunner service this week, and Trebuchet was not reporting the deploy as completed for some reason. I eventually gave up and did the update manually [08:56:05] mmm ok so let's not do it on a Friday [08:56:17] don't want to cause major headaches before the weekend, this work is not that urgent [08:56:20] will restart on Monday [08:56:41] maybe I can ping the Labs folks this evening to get the quota reviewed [09:12:30] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:16:42] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:25:35] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643069 (10hashar) @addshore wrote https... [09:29:10] hashar: oooh, nice little post there! [09:30:57] addshore: that is probably non sense [09:31:22] it sounds believable :P but you were just talking about growing tomatoes... [09:31:45] hehehe [09:31:49] I have asked ops [09:33:32] [= [09:34:36] REPRODUCED !!!!!!!!!!!!!!!!! [09:34:39] I am such a ahcker [09:34:47] /usr/bin/timeout: the monitored command dumped core [09:35:45] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643086 (10hashar) Reproduction on terbiu... [09:37:12] and I cant access the core files obviously :( [09:37:29] tis write only [09:37:38] I am happy anyway [09:39:47] addshore: heading school brb [09:39:56] *waves* [09:50:23] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#2643107 (10Paladox) [09:51:43] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [09:52:00] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#546902 (10Paladox) @hashar hi, would this https://gerrit.wikimedia.org/r/#/c/311032/ look ok? Since I doint thin you can filter refs on pa... [09:52:30] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:58:39] back [10:08:14] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10zeljkofilipin) >>! In T145819#... [10:14:32] Project beta-code-update-eqiad build #121704: 04FAILURE in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121704/ [10:15:26] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643170 (10hashar) And strace is again my... [10:16:42] RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:47] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643173 (10hashar) TLDR: when running mws... [10:17:59] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07HHVM, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643176 (10hashar) [10:22:45] PROBLEM - Puppet run on mira02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [10:24:33] Project beta-code-update-eqiad build #121705: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121705/ [10:25:35] 10:23:02 INFO:mwextupdate:running: git submodule update --init --recursive [10:25:35] 10:23:44 No submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue' [10:25:35] 10:24:30 Failed to recurse into submodule path 'FundraisingEmailUnsubscribe' [10:25:54] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 10MobileFrontend, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643211 (10zeljkofilipin) [10:27:09] 10Browser-Tests-Infrastructure, 10MobileFrontend, 10QuickSurveys, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin) [10:29:40] 10Browser-Tests-Infrastructure, 10MobileFrontend, 10QuickSurveys, 06Reading-Web-Backlog, 15User-zeljkofilipin: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643221 (10zeljkofilipin) [10:30:45] zeljkof: for that task, maybe look in logstash ? [10:31:11] hashar: stumbled upon that while in the middle of something else :| [10:31:17] https://phabricator.wikimedia.org/T145799 [10:31:34] so no time at the moment for another investigation :) [10:33:44] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:34:31] Project beta-code-update-eqiad build #121706: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121706/ [10:42:57] https://gerrit.wikimedia.org/r/311109 [10:43:03] That should be the fix for broken code update [10:43:08] It just feels seriously icky [10:43:15] git submodule... for something brought in via composer? [10:43:18] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07HHVM, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643247 (10hashar) [10:43:31] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10hashar) [10:44:31] Project beta-code-update-eqiad build #121707: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121707/ [10:49:23] hashar: Think we should just merge https://gerrit.wikimedia.org/r/311109 ? [10:52:34] Reedy: I have no idea. Ask fundraising I guess :] [10:52:42] Reedy: note we have job that do install from composer [10:52:55] though in this case they seem to include all the deps they need via a composer.json / composer.lock [10:53:01] so most probably, we dont want a submodule [10:53:06] but just composer update or whatever [10:53:28] Reedy: and their composer.json has https://gerrit.wikimedia.org/r/p/wikimedia/fundraising/php-queue.git [10:53:32] so they use a fork [10:53:48] at commit 14198ba1f7d4868933649a85621a3955965e83cd [10:54:41] Project beta-code-update-eqiad build #121708: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121708/ [10:56:13] hashar: indeed [10:56:24] As to why they're just committing the folder like htat... [10:56:26] Is another issue [10:56:30] Multiple versions of monolog etc [10:56:52] hashar: composer update doesn't do anything [10:58:12] https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/tree/master/vendor/coderkungfu [10:58:25] That shows it being committed as a git submodule [10:59:28] https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/commit/3d14421a8300cfb7185752e974750f12baad96c3#diff-18108d068d013f322534c3d4945868dcR1 [10:59:33] Subproject commit 14198ba1f7d4868933649a85621a3955965e83cd [11:00:27] I presume, it's due to the way it was setup... Someone cloned it first, and then just committed the files in place [11:02:15] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643283 (10hashar) `SiteConfigu... [11:04:12] Reedy: I think I remember a bug about it [11:04:25] with composer update not being able to figure out a new version is available when a sha1 is used [11:04:34] Project beta-code-update-eqiad build #121709: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121709/ [11:04:40] If you recreate the commit, ontop of HEAD~1 [11:04:41] it works fine [11:07:37] * Reedy reverts the commit and remakes it [11:13:43] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:14:33] Project beta-code-update-eqiad build #121710: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121710/ [11:24:35] Project beta-code-update-eqiad build #121711: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121711/ [11:34:32] Project beta-code-update-eqiad build #121712: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121712/ [11:36:29] !log apt-get upgrade on deployment-tin , bring in a new hhvm version and others [11:36:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:36:53] Reedy: revert your code please [11:37:02] Reedy: beta scap breaks with: [11:37:03] 00:00:03.105 INFO:mwextupdate:running: git submodule update --init --recursive [11:37:03] 00:00:45.192 No submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue' [11:37:03] 00:01:28.019 Submodule path 'Wikidata': checked out 'f99c17a841aac26b56465413696c6ea991c1a222' [11:37:03] 00:01:30.460 Failed to recurse into submodule path 'FundraisingEmailUnsubscribe' [11:37:11] hashar: Sorry? [11:37:15] I didn't merge anything [11:37:19] ohh [11:37:19] It was already broken with that [11:37:20] funny [11:37:21] I was trying to fix it [11:37:33] ah [11:37:37] the first build failling is https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121704/console [11:37:41] I proposed 2 different fixes ;) [11:37:53] 00:00:02.973 From https://gerrit.wikimedia.org/r/p/mediawiki/extensions/FundraisingEmailUnsubscribe [11:37:53] 00:00:02.973 d1062e8..3d14421 master -> origin/master [11:41:54] hashar: yeah, so hence one commit adding the .gitmodule which feels icky [11:42:04] and a second commit, reverting that one, and redoing it using only composer [11:42:51] !log beta: apt-get upgrade on deployment-jobrunner01 [11:42:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:44:32] Project beta-code-update-eqiad build #121713: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121713/ [11:45:43] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643378 (10hashar) On the #beta... [11:50:44] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643380 (10hashar) I am trying... [11:54:34] Project beta-code-update-eqiad build #121714: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121714/ [12:04:34] Project beta-code-update-eqiad build #121715: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121715/ [12:06:19] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643404 (10zeljkofilipin) [12:07:58] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643405 (10hashar) Running: su... [12:13:03] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 2 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643423 (10hashar) On terbium c... [12:14:30] Project beta-code-update-eqiad build #121716: 04STILL FAILING in 1 min 29 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121716/ [12:24:32] Project beta-code-update-eqiad build #121717: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121717/ [12:24:49] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643458 (10zeljkofilipin) [12:26:59] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [12:30:44] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643463 (10hashar) Got a `Faile... [12:34:33] Project beta-code-update-eqiad build #121718: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121718/ [12:44:31] Project beta-code-update-eqiad build #121719: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121719/ [12:54:32] Project beta-code-update-eqiad build #121720: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121720/ [13:01:58] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [13:03:49] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643512 (10zeljkofilipin) [13:04:05] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10hashar) From T145839... [13:04:32] Project beta-code-update-eqiad build #121721: 04STILL FAILING in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121721/ [13:07:04] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643518 (10zeljkofilipin) [13:09:47] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643524 (10zeljkofilipin) [13:14:29] Project beta-code-update-eqiad build #121722: 04STILL FAILING in 1 min 29 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121722/ [13:21:13] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643533 (10zeljkofilipin) Not reproducible on local media... [13:21:35] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643535 (10hashar) [13:24:34] Project beta-code-update-eqiad build #121723: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121723/ [13:25:40] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643545 (10zeljkofilipin) [13:26:25] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2641459 (10zeljkofilipin) Also reproducible targeting pro... [13:26:49] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2643551 (10hashar) Rollbacked due to account creation being broken T145839 Investigation is on T145819. [13:29:03] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643555 (10zeljkofilipin) Not reproducible in integration... [13:34:31] Project beta-code-update-eqiad build #121724: 04STILL FAILING in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121724/ [13:44:32] Project beta-code-update-eqiad build #121725: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121725/ [13:48:54] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643609 (10zeljkofilipin) API seems to work fine on beta... [13:52:27] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 2 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643622 (10zeljkofilipin) Verbose stack trace: MobileFro... [13:54:29] Project beta-code-update-eqiad build #121726: 04STILL FAILING in 1 min 28 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121726/ [14:02:44] RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:04:35] Project beta-code-update-eqiad build #121727: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121727/ [14:14:35] Project beta-code-update-eqiad build #121728: 04STILL FAILING in 1 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121728/ [14:24:34] Project beta-code-update-eqiad build #121729: 04STILL FAILING in 1 min 33 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121729/ [14:29:39] ^ should be fixed [14:30:47] this is from gallium [14:30:48] PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20090626/mysql.so' - /usr/lib/php5/20090626/mysql.so: cannot open shared object file: No such file or directory in Unknown on line 0 [14:30:55] (root@ cron emails) [14:31:13] anything known? [14:31:18] it appeared today afaics [14:31:22] eh, not to me... [14:31:30] ^ hashar have you seen? [14:31:36] 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2643749 (10MoritzMuehlenhoff) A few jessie-related changes have been sorted out, mira02.deployment-prep.eqiad.wmflabs should be ready for testing. [14:34:32] Yippee, build fixed! [14:34:32] Project beta-code-update-eqiad build #121730: 09FIXED in 1 min 31 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121730/ [14:35:03] hrm, I don't have permission to view the crontab for root on gallium. [14:39:56] we have php5-mysql which is what supplies mysql.so afaik. Not sure why it would disappear. [14:40:15] I'm actually going afk to finish up morning things. back soon. [14:42:41] yeah, should be where that comes from: http://packages.ubuntu.com/precise/amd64/php5-mysql/filelist and we have pdo_mysql.so but not mysql or mysqli.so... [14:45:20] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643760 (10hashar) I am enlargi... [14:49:38] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2643768 (10hashar) [14:50:06] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2643770 (10hashar) [14:53:48] PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:55:23] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2643777 [14:56:35] Yippee, build fixed! [14:56:36] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #156: 09FIXED in 3 min 51 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/156/ [14:58:03] elukey: thcipriani|afk oh my god [14:58:07] I have removed mysql from gallium [14:58:23] 13:35 gallium: removing MySQL which is no more defined in puppet and running puppet. Did: apt-get remove mysql-common mysql-server mysql-server-core-5.5 [14:59:01] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:00:38] did dpkg --purge php5-mysql [15:01:25] fixed [15:01:32] thcipriani|afk: we are back to wmf.18 [15:01:46] the wikidata job issue has its root cause deep in mw/hhvm etc [15:01:53] and also prevented account creation entirely [15:01:56] so I have just rollbacked [15:01:59] incident report at https://wikitech.wikimedia.org/wiki/Incident_documentation/20160915-MediaWiki [15:02:11] main task is https://phabricator.wikimedia.org/T145819 [15:02:19] with all the crazy details from yesterday and this morning debugging [15:02:37] TL;DR: have to purge the HHVM bytecode cache everywhere [15:02:42] then I guess we will be able to push wmf.19 [15:02:53] more longterm: fix https://phabricator.wikimedia.org/T111441 :] [15:03:03] and garbage collect hhvm bytecode [15:03:16] anyway was just brain dumping [15:03:43] wowza [15:05:39] damn [15:05:44] that's some solid debugging [15:09:25] yeah [15:09:35] moritzm: I am off. Thank you for mira02 :] [15:09:54] have a nice weekend! [15:12:16] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 4 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2643803 (10jhobs) Thanks for the fast turnaround @zeljkof... [15:15:53] thcipriani: yeah [15:16:01] thcipriani: well we can talk about it next monday :] [15:16:09] hashar: kk, sounds good. [15:16:15] thcipriani: I am out myself. Will probably not show up this evening [15:16:19] I feel pretty tired :] [15:16:31] have a good weekend :) [15:16:34] Yippee, build fixed! [15:16:34] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #159: 09FIXED in 17 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/159/ [15:16:41] danke [15:20:43] hasharAway hi, following from the gerrit commit that you abandon sorry that you thought that i was guessing but im not. [15:20:46] It actually works [15:20:55] been doing that on gerrit-test and seems to work [15:21:13] Since it only excluded refs/meta/config refs [15:23:09] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2643810 [15:23:18] 10Continuous-Integration-Infrastructure, 06Operations, 10puppet-compiler, 13Patch-For-Review: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2643813 (10hashar) a:03fgiunchedi Looks like @fgiunchedi solved it :) [15:24:10] paladox: feel free to reopen but write a good description in the commit summary :] [15:24:26] Ok [15:24:27] paladox: also switch to the group that contains the jenkins bot since we can rename that user later on [15:24:27] yeh [15:24:28] :] [15:24:32] I am off, good week-end! [15:24:43] Ok, i just didnt want it to block you hasharAway [15:24:49] since your included in that group [15:27:15] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:30:04] moritzm Hi, yay looks like grrrit-wm is lasting longer again with my under the hood changes, I did a work around to use npm 2 and it seems to work. [15:33:45] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:34:02] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [15:37:50] RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:43:22] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #160: 04FAILURE in 18 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/160/ [15:51:17] Yippee, build fixed! [15:51:17] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #160: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/160/ [16:02:14] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [16:06:25] re: deployment-prep quota increase, anything we can do to speed things up? mostly T145611 and T145636 [16:07:18] (I want stashbot back giving me links) [16:07:31] stashbot: :( [16:07:31] * greg-g is in a meeting will catch up in about 23 minutes [16:07:43] indeed, I was expecting the links too [16:11:09] +1 for the request, it is blocking a bit the deployment-prep debian migration [16:13:05] short quip: talk to chase ;) [16:13:14] but, I'll review the state soon [16:16:46] makes sense, I'll talk to chase, thanks greg-g ! [16:28:03] godog: do it in a public (or at least _security) channel, plz, so I can follow :) [16:29:11] greg-g: *nod* I don't see chase online, will do tho [16:30:25] word [17:21:13] 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644059 (10Ladsgroup) [17:25:08] 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644074 (10Paladox) Testing with gerrit 2.12.4 seems to stop the 500 error but it replaces it with ?? [17:27:40] 10Gerrit: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644084 (10Paladox) Filled upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=4570 [17:27:49] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644085 (10Paladox) [17:32:36] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644115 (10Paladox) It looks like it might be because we set utf-8 on MySQL. Since testing here https://gerrit-review.googlesource.com/#/c/86192/ shows it works. Also testing on a local gerrit 2.13 results in the s... [17:38:56] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644142 (10hashar) [17:48:33] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#2644157 (10thcipriani) p:05Triage>03Normal I have the same concern for git-fat repos as well. [17:48:46] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP), 13Patch-For-Review: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2644160 (10mmodell) [17:49:11] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644059 (10hashar) Pasted the server side stacktrace. Looks like com.google.gwtorm.jdbc / com.google.gwtorm.schema.sql is not fully Unicode aware :] We have for `database.url` jdbc:mysql://<%= @db_host %>/<%= @db_n... [17:49:53] 10Deployment-Systems, 03Scap3: Scap subcommand bash completion - https://phabricator.wikimedia.org/T135317#2644167 (10thcipriani) 05Open>03Resolved a:03thcipriani There is currently bash completion for subcommands, flags, files, and directories live as of scap v.3.2.1-1. [17:50:59] 03Scap3, 10scap: scap to reload a service instead of restart - https://phabricator.wikimedia.org/T134001#2644175 (10thcipriani) p:05Triage>03Normal [17:53:42] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644186 (10Paladox) @hashar thanks. [17:56:44] (03CR) 10Florianschmidtwelzow: "needs rebase." [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox) [17:56:59] (03PS2) 10Paladox: [CookieWarning] Add Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/310561 [17:57:30] (03CR) 10Paladox: "@Florianschmidtwelzow hi it doesn't, it's do to the setting we have for this repo in gerrit, we choice fast forward like it is for puppet." [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox) [17:57:48] 03Scap3: Make symlink-swapping optional in deploy promote - https://phabricator.wikimedia.org/T145889#2644204 (10mmodell) [17:58:38] 03Scap3: Scap3 config references to deployed directory - https://phabricator.wikimedia.org/T145437#2644218 (10mmodell) I've made a task for {icon arrow-up} that. {T145889} [18:08:26] 10Gerrit, 07Upstream: Gerrit is not emoji friendly - https://phabricator.wikimedia.org/T145885#2644229 (10Paladox) I also filled https://bugs.chromium.org/p/gerrit/issues/detail?id=4571 upstream due to it being a 500 error. [18:13:29] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:45:13] hiyaaa, if i want to send logs to logstash in deployment-prep, what host/port (s) do I use? [18:53:28] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:55:37] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:05:49] ottomata: heya, little late to reply, but FWIW, the host we use for scap to check logstash in beta is logstash2.deployment-prep.eqiad.wmflabs. The public url is https://logstash-beta.wmflabs.org [19:08:19] great, so if i configure a thing to use that port 12201, shoudl work? [19:09:17] ah ya, totally works! :) [19:10:31] cool. Was just checking on that, might need some holes punched from one server to another, but if it works, guess not :) [19:15:49] Yippee, build fixed! [19:15:50] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #147: 09FIXED in 47 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/147/ [19:16:16] Yippee, build fixed! [19:16:16] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #147: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/147/ [19:19:54] Hi releng! I'm trying to add tests to FundraisingEmailUnsubscribe, and am still not sure where to require the composer autoload to get it picked up by tests. [19:20:12] I added a hooks class with a static callback referenced in extension.json [19:20:44] and that totally works when I run the tests locally in a wiki with wfLoadExtension in its LocalSettings [19:21:14] but CI doesn't seem to load the extension the same way. Do I need to add FundraisingEmailUnsubscribe to some list? [19:25:12] ejegg: hrm. I think there may be different variant for extension tests that use composer. [19:25:17] I'm not 100%, honestly. [19:26:09] legoktm: would be able to answer this question better than I could I think. [19:26:12] thcipriani: ah, in that repo we're actually checking in the vendor dir, since there's not a lot more to the extension [19:26:23] so we don't even need to run composer [19:26:30] just need to hit that require_once line [19:26:56] thcipriani: do you know if the extensions test suite calls wfLoadExtension somewhere? Seems like it would need to [19:27:13] but it doesn't look like it's getting to the callback [19:29:58] gerrit on a go slow today? [19:30:11] ejegg: if it's extension.json... [19:30:15] You don't need to add any callback [19:30:17] Yippee, build fixed! [19:30:18] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #161: 09FIXED in 16 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/161/ [19:30:20] We have auto discovery now [19:30:32] just needs to be in tests/phpunit (if phpunit tests etc) [19:30:37] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [19:31:43] ejegg: https://phabricator.wikimedia.org/T142120 and https://phabricator.wikimedia.org/T142121 [19:31:49] https://gerrit.wikimedia.org/r/#/c/302944/ [19:32:27] Reedy: yep, it's finding the tests just fine, but I'm getting an undefined class error for some stuff that we've checked in under a vendor dir [19:32:38] Something not in the autoloader? [19:32:46] oh [19:32:46] The callback is just to require_once the composer autoloader [19:32:54] No, you don't need that [19:33:01] https://gerrit.wikimedia.org/r/311174/ [19:33:19] ejegg: You didn't update https://github.com/wikimedia/mediawiki-extensions-FundraisingEmailUnsubscribe/blob/master/extension.json ! [19:33:37] ejegg: you need to add "load_composer_autoloader": true, [19:33:54] aha, thanks! [19:35:05] ejegg: Have you done anything to fix the git submodule errors yet? [19:35:42] Reedy: working locally now [19:35:48] :) [19:36:10] Reedy: submodule errors for FundrasingEmailUnsubscribe? [19:36:24] You were breaking beta code update [19:36:32] I fixed those this morning [19:36:36] ah, how? [19:36:41] ooh, I didn't realize! thanks thcipriani [19:37:09] I did: git rm --cached [the path it was looking for...kungfu-something] [19:37:10] thcipriani: I see no commit to the extension... which suggests it'll still be broken locally? [19:37:13] and then a git submodule update [19:37:52] reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git pull [19:37:52] Already up-to-date. [19:37:52] reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git s^C [19:37:52] reedy@ubuntu64-web-esxi:/var/www/wiki/mediawiki/extensions/FundraisingEmailUnsubscribe$ git submodule update --init --recursive [19:37:52] fatal: no submodule mapping found in .gitmodules for path 'vendor/coderkungfu/php-queue' [19:38:14] oh dang, I added dev-master dependency and forgot to delete .git before checking in the files [19:38:15] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #161: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/161/ [19:38:18] ah, dang, I just figured there was something wrong with beta code update. [19:38:22] ok, I'll fix that [19:38:26] sorry folks! [19:38:35] ejegg: I've 2 seperate fixes available [19:38:40] https://gerrit.wikimedia.org/r/311109 add .gitmodule [19:38:51] https://gerrit.wikimedia.org/r/311111 revert and redo https://gerrit.wikimedia.org/r/311112 [19:38:54] nah, we don't actually want it as a submodule [19:39:07] heh [19:39:14] so the second two reinstate it via composer alone [19:39:58] tbh, those 2 should probably just be squashed [19:40:07] which should be worthwhile for a fix [19:40:31] hmm, that re-do doesn't check in the phpqueue classes. Let me try something [19:40:59] Yup [19:41:00] moment [19:41:44] ah, screw it [19:41:49] I'll abandon mine [19:42:32] ejegg: please merge https://gerrit.wikimedia.org/r/#/c/311110/ though :P [19:42:49] ah word, will do! [19:46:31] I'll let you fix the "submodule" issue though [19:46:46] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 4 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2644400 [19:50:13] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-RelatedArticles, 10MobileFrontend, 10QuickSurveys, and 4 others: Mediawiki::Client#login receiving 403 Forbidden responses from the API in build tests - https://phabricator.wikimedia.org/T145799#2644405 (10Jdlrobson) 05Open>03Resolved Thanks for ra... [19:50:43] whole puppet runs are broken on beta arent they ? [19:50:58] hm? [19:51:04] http://shinken.wmflabs.org/problems?search=deployment [19:51:26] wow [19:51:45] maybe it is just shinken being mad [19:52:17] Ah [19:52:26] It's our old friend the double-puppet.conf bug [19:53:41] !log beta created instance "deployment-parsoid05" Should be deleted later, that is merely to purge the hostname from Shinken ( http://shinken.wmflabs.org/host/deployment-parsoid05 ) [19:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:54:12] you sure that'll work hashar? [19:54:18] yeah [19:54:21] gotta wait [19:54:30] then eventually delete the instance and magically it get purged [19:54:41] maybe it hasn't been properly deleted in ldap or whatever source [19:54:48] so on beta [19:54:51] on a random instance [19:54:57] Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key. [19:54:57] Certificate fingerprint: 75:17:D2:C5:D1:E6:05:66:DD:72:45:84:BC:E8:AA:9A:D0:0A:88:A4:B3:7D:4D:A3:BC:C5:A0:2B:18:01:16:92 [19:54:57] To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficat [19:55:05] there is exactly ZERO WAY I am going to fix all the instances [19:55:13] ejegg|meet: Yeah, I think it'll be easiest for you to fix it [19:55:26] I'm looking into it [19:59:29] krenair@deployment-puppetmaster:~$ sudo -i puppet cert list --all | grep ms-be02 [19:59:29] + "deployment-ms-be02.deployment-prep.eqiad.wmflabs" (SHA256) 3F:22:48:A3:37:37:D9:B6:48:FF:2B:67:EB:69:1F:F8:D1:2E:6F:59:02:CC:40:06:8C:15:0C:F2:3B:61:09:EB [19:59:37] root@deployment-ms-be02:~# openssl x509 -in /var/lib/puppet/client/ssl/certs/deployment-ms-be02.deployment-prep.eqiad.wmflabs.pem -noout -fingerprint -sha256 [19:59:37] SHA256 Fingerprint=3F:22:48:A3:37:37:D9:B6:48:FF:2B:67:EB:69:1F:F8:D1:2E:6F:59:02:CC:40:06:8C:15:0C:F2:3B:61:09:EB [20:00:03] Reedy: I think this should do it: https://gerrit.wikimedia.org/r/311176 [20:00:18] let me test :) [20:00:52] Tests are now passing! https://gerrit.wikimedia.org/r/311174 [20:00:57] yup [20:00:59] Krenair: ops is heavily refactor puppetmaster [20:00:59] WFM [20:01:08] +2'd [20:01:32] hashar, I'm not yet blaming anything on ops [20:01:36] for the submodule, that is [20:02:04] thanks Reedy ! [20:02:19] That was totally the issue with the tests too, just missing those files [20:02:20] Krenair: last time it occured, I have just deleted everything :] [20:02:35] but, I'm happy to have learned about load_composer_autoloader! [20:02:51] hashar, cleaned all certs on the puppetmaster? [20:03:46] yeah [20:03:58] and deleted all /var/lib/puppet/client/ssl dirs on all instances via salt [20:04:03] maybe the ca has changed [20:05:36] Krenair: I am just going to do that (delete everything [20:07:11] !log beta: stopping puppetmaster, rm -f /var/lib/puppet/server/ssl/ca/signed/* [20:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:07:53] !log beta: salt -v '*' cmd.run 'rm -fR /var/lib/puppet/client/ssl/' [20:07:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:11:10] Krenair: solved :) [20:12:18] PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:12:43] ah [20:12:52] so salt -v '*' cmd.run 'puppet agent -tv' [20:13:03] that is definitely a good way to kill the puppetmaster :D [20:13:50] !log beta: restarted puppetmaster [20:13:56] PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:13:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:14:02] PROBLEM - Puppet run on deployment-sentry01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:14:04] PROBLEM - Puppet run on deployment-conf03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:10] PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:20] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:34] PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:40] PROBLEM - Puppet run on deployment-poolcounter02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:46] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:14:50] PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:14:51] PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:14:59] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:15:05] PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:15:21] PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:16:07] PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:16:13] PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:16:39] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:16:39] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:16:53] PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:16:55] PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:17:01] PROBLEM - Puppet run on deployment-kafka01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:17:01] PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:17:09] PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:17:11] PROBLEM - Puppet run on deployment-pdf02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:17:19] PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:17:23] PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:17:28] PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:17:30] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:17:38] PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:17:44] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:17:46] PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:17:48] PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:17:48] PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:17:52] PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:17:56] PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:18:00] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:08] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:18:12] PROBLEM - Puppet run on deployment-ircd is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:18:14] PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:18:14] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:18:18] PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:18:20] PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:24] PROBLEM - Puppet run on deployment-conftool is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:18:26] oh hi [20:18:26] PROBLEM - Puppet run on deployment-kafka03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:18:31] PROBLEM - Puppet run on deployment-jobrunner01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:31] PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:33] PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:35] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:18:35] PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:18:45] PROBLEM - Puppet run on mira02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:45] PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:48] greg-g: that is all me [20:18:49] PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:18:49] PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:51] PROBLEM - Puppet run on deployment-logstash2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:18:59] it is actually runing just fine [20:19:08] heh [20:19:26] I'll take my car to the mechanic (again) and check in later, probably 30 minutes from now [20:19:29] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:19:29] PROBLEM - Puppet run on mira is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:19:36] greg-g: I will be gone :) [20:19:40] g'night! [20:19:41] greg-g: have a safe trip_ [20:19:45] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:24:07] RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:24:36] RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [20:24:40] RECOVERY - Puppet run on deployment-poolcounter02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:25:02] 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#2644460 (10Paladox) Upstream in openstack actually only allow project owners to read refs/meta/config (https://review.openstack.org/#/admi... [20:26:06] RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [20:26:12] RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [20:26:39] Krenair: yeah so I have eventually overloaded the puppetmaster via a salt command and had to kill the puppetmaster [20:26:47] :) [20:26:51] hence the spam of alarms there. But all hosts are recovered as far as I can tell [20:27:09] !log beta: force running puppet in batches of 4 instances: salt --batch 4 -v 'deployment-*' cmd.run 'puppet agent -tv' [20:27:10] RECOVERY - Puppet run on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:12] RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:27:16] RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:21] it is not very smart [20:27:27] RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:31] pupetmaster apparently tries to compile as many catalogs it can [20:27:44] until the host goes out of memory [20:27:49] RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:51] RECOVERY - Puppet run on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:57] RECOVERY - Puppet run on deployment-pdfrender is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:25] RECOVERY - Puppet run on deployment-conftool is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:33] RECOVERY - Puppet run on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:35] RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:49] RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:57] RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:05] RECOVERY - Puppet run on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:14] http://prometheus.wmflabs.org/alerts is going to be wayyy better [20:29:19] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:31] RECOVERY - Puppet run on mira is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:43] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:47] RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:59] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [20:30:04] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:30:22] RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:31:38] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:31:38] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [20:31:52] RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:32:02] RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:32:11] 10Beta-Cluster-Infrastructure, 06Operations, 05Prometheus-metrics-monitoring: deploy prometheus node_exporter and server to deployment-prep - https://phabricator.wikimedia.org/T144502#2601885 (10hashar) That works [[ https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=839239&oldid=839... [20:32:30] https://meta.wikimedia.org/wiki/Merchandise_giveaways/Nominations#Paladox :) [20:32:34] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:32:44] ahah [20:32:46] I am going to approve it [20:32:52] RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:32:58] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:14] RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:14] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:16] RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:20] RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:30] RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:33] RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:45] RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:45] RECOVERY - Puppet run on mira02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:51] RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:53] paladox: anything else you want from https://store.wikimedia.org/ ? [20:33:57] please ask you are my guest [20:34:01] RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:34:07] LOL nope [20:34:47] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:37:34] Thomas Mulhall (paladox) [20:37:34] 12810 [20:37:34] Independent [20:37:49] sounds like openstack stackalytics isn't it? [20:38:03] hashar it's from http://korma.wmflabs.org/browser/scr-contributors.html [20:38:09] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [20:38:13] RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [20:38:18] but i have no idea where it gets thomas mulhall from, it's partially right [20:38:23] RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:38:29] RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:02] RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:20] RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:32] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:42] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:48] RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:43:01] I havent set my name any where though except from personal email address [20:43:27] gerrit? [20:43:34] Nope [20:43:53] yes [20:43:53] Author [20:43:54] Paladox [20:43:55] My real name is hidden but still it manages to get leaked, but lol, it only got it partially right, first name is wrong [20:44:28] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:44:42] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:44:55] ^^ But it seemed i managed to also leak my middle name but i have no idea how i managed that [20:45:02] since i never right my middle name online [20:45:16] must've somewhere at some point [20:46:12] nope [20:46:35] if it's being used, it's been posted somewhere [20:46:57] RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:23] RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:56] Oh, not sure woulden be facebook since i doint use my middle name but i also have it set so google carnt find me [20:48:23] Krenair: all clear :]  for now [20:48:27] great [20:50:55] actually, my inbox is less great :p [20:51:11] PROBLEM - Free space - all mounts on mira02 is CRITICAL: CRITICAL: deployment-prep.mira02.diskspace._srv.byte_percentfree (<11.11%) [20:52:05] !log fixed puppet on deployment-parsoid05 . Temporary instance will delete it later to clear out shinken.wmflabs.org [20:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:54:17] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:55:32] hashar: you're still here :) [20:55:58] ueah aze [20:56:01] for the good cause! [20:56:30] I have voted to get Paladox a tshirt https://meta.wikimedia.org/wiki/Merchandise_giveaways/Nominations#Paladox :D [20:56:40] :) [20:56:45] thanks [20:56:54] hashar, I don't think the -parsoid05 trick has worked [20:57:04] ldap still contains the old broken data [20:57:20] so shinkengen is still giving out the wrong ip and failing tests [20:59:26] ah :( [20:59:45] when you created that instance, sink_nova_ldap probably tried to create a new entry in ldap [20:59:54] and fail due to a dupe ? [20:59:55] if I'm right, it would've failed [20:59:57] yes [21:00:05] deleting it :( [21:00:10] ldap would see the existing one and deny it [21:00:27] !log deleted deployment-parsoid05 [21:00:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:01:08] RECOVERY - Free space - all mounts on mira02 is OK: OK: All targets OK [21:02:02] hashar, that may have done the trick though [21:03:01] Krenair: maybe on deletion it does not validate much [21:03:28] !log deployment-tin did a git gc on /srv/deployment/ores That freed up disk space and cleared an alarm on co master mira02 [21:03:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:03:36] we will have to recreate mira02 with a larger disk :( [21:03:44] and get a new flavor [21:03:53] that is the thing I dont get with openstack [21:04:17] https://phabricator.wikimedia.org/diffusion/GSNL/browse/master/nova_ldap/base.py;9ba4f8b4993416787772f41676ad07c88fa20527$177 [21:04:17] why not let the enduser pick any (X CPU, Y RAM, Z Disk) [21:04:22] instead of the imposed flavors [21:05:51] yep that got rid of the nonsense from shinken [21:08:50] RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:11] Krenair: magic!!!!! :) [21:17:19] Krenair: thank you for the support and confirmation [21:17:46] Krenair: also I have dig in my log history / some task. Godog is going to add prometheus to beta as soon as the last two precise instances we have are gone [21:18:56] as I undersand that system, you load it with a s*** ton of random metrics collected from everything you hve [21:19:05] then have some kind of a query language to define the alarms [21:19:17] (03PS2) 10Mattflaschen: Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) [21:19:25] LOL, since 2013 ive contributed more then most people on gerrit. [21:19:55] paladox: you are already in the top 1000 contributors on GitHub [21:20:04] Oh [21:20:06] wow [21:20:09] DIdnt know that [21:20:12] kidding :] [21:20:17] oh lol [21:20:18] fix moare bugs! [21:20:22] yep [21:21:17] (03CR) 10Reedy: [C: 031] Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) (owner: 10Mattflaschen) [21:21:27] http://githut.info/ is a quite fun site [21:21:52] oh [21:22:14] as is http://ghv.artzub.com/#user=wikimedia [21:22:29] gives folk a public API and they start building nice visualizations [21:22:41] hashar 1,124 contributions in the last year [21:22:45] thats me [21:24:26] hashar look at this http://korma.wmflabs.org/browser/irc.html guess who is number 2. [21:27:48] hashar, so, like with labmon.eqiad.wmnet? [21:34:13] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [21:36:23] Krenair: I think prometheus is able to fetch data from labmon yeah [21:36:37] Krenair: and I guess from logstash/elasticsearch [21:37:04] so maybe you can do some query that combines both and alarm based on that [21:39:06] hashar I still need to continue with the project cleaning adding npm and composer, (1,000+) [21:39:11] lol [21:40:23] 06Release-Engineering-Team, 10Monitoring, 06Operations, 13Patch-For-Review: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2644626 (10Tgr) [23:03:59] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637873 (10greg) See also, which you can use for expanding the project plan: https://www.mediawiki.o...