[02:57:50] PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [03:19:51] PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [03:37:50] RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [03:54:51] RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [04:10:35] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<55.56%) [06:35:53] Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #293: 04FAILURE in 1 hr 55 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/293/ [06:54:16] PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:54:46] PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:00:37] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:03:09] PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:29:13] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:49] RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [08:50:20] 10Gerrit: Gerrit code review view jumps/scrolls up and down when commenting - https://phabricator.wikimedia.org/T159919#3083312 (10Nikerabbit) [09:19:38] 10Gerrit: Gerrit code review view jumps/scrolls up and down when commenting - https://phabricator.wikimedia.org/T159919#3083312 (10Nemo_bis) I can reproduce consistently on Firefox 51.0.1 if I do the following: click or open one line, for instance https://gerrit.wikimedia.org/r/#/c/340961/6/modules/mw.cx.TargetA... [09:25:43] hi jgirault you are spamming [09:37:40] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3083451 (10hashar) https://gerrit.wikimedia.org/r/#/c/341701/ lowered the heap from 28GB to 20GB Graph of memory usage can be seen above T148478#3024661 For the [[ https:/... [10:14:25] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3083515 (10Gehel) The JVM makes no promises about releasing the memory to the OS. So it is possible that the memory has been freed from the application, but it still retained... [10:36:40] Project beta-scap-eqiad build #145507: 04FAILURE in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145507/ [10:38:52] (03CR) 10Phedenskog: "@Krinkle yep we cannot fit in the time. This morning I turned 11 runs on test instance and now also 21 (testing three URLs), then we check" [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [10:46:40] Project beta-scap-eqiad build #145508: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145508/ [10:57:02] Project beta-scap-eqiad build #145509: 04STILL FAILING in 2 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145509/ [11:07:07] Yippee, build fixed! [11:07:08] Project beta-scap-eqiad build #145510: 09FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145510/ [11:24:43] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3083762 (10jcrespo) Qualitatively, gerrit has been working ok to me lately, although it logged me out several times (I assume the service was restarted). [11:26:45] Project beta-scap-eqiad build #145512: 04FAILURE in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145512/ [11:36:32] Project beta-scap-eqiad build #145513: 04STILL FAILING in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145513/ [11:39:24] hashar you may have found a bug as the patchset-creation should not be able to hook into refupdated [11:39:34] im looking at the docs [11:39:34] https://github.com/GerritCodeReview/plugins_its-base/blob/master/src/main/resources/Documentation/config-rulebase-common.md [11:39:44] https://phabricator.wikimedia.org/diffusion/OPUP/browse/HEAD/modules/gerrit/files/etc/its/actions.config;c0662d947135f1f738501dc29346c410514555f1$16 [11:46:05] I guess when someone links to a bug after doing a second commit thats when it fires RefUpdated. But it shoulden't fire the PatchSetCreated event [11:47:15] Yippee, build fixed! [11:47:15] Project beta-scap-eqiad build #145514: 09FIXED in 2 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145514/ [12:36:38] Project beta-scap-eqiad build #145519: 04FAILURE in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145519/ [12:46:59] Yippee, build fixed! [12:46:59] Project beta-scap-eqiad build #145520: 09FIXED in 2 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145520/ [12:56:19] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 51127 bytes in 0.554 second response time [13:01:22] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 46476 bytes in 2.217 second response time [13:49:09] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:06:37] Project beta-scap-eqiad build #145528: 04FAILURE in 1 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145528/ [14:07:20] hashar: that dang beta scap eqiad build is failing again... [14:14:20] Yippee, build fixed! [14:14:20] Project beta-scap-eqiad build #145529: 09FIXED in 2 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145529/ [14:23:15] Project beta-scap-eqiad build #145531: 04FAILURE in 1 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145531/ [14:26:56] Project beta-scap-eqiad build #145532: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145532/ [14:33:43] Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #325: 04FAILURE in 1 min 42 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/325/ [14:50:45] zeljkof: and I found out chromium need an extension to be installed apparently [14:50:54] Chromium browser automation [14:50:56] does that ring a bell? [14:51:01] no [14:51:09] extension for what? [14:51:12] selenium? [14:51:18] taking screenshots apparently [14:51:22] oh [14:51:26] strange [14:51:33] I ran the user.js specs with invalid user/password [14:51:42] and eventually get an error about: [14:51:48] cannot get automation extension [14:51:56] from unknown error: page could not be found: chrome-extension://aapnijgdinlhnhlmodcfapnahmbfebeb/_generated_background_page.html [14:57:36] Yippee, build fixed! [14:57:37] Project beta-scap-eqiad build #145533: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145533/ [15:02:07] Project beta-scap-eqiad build #145534: 04FAILURE in 2 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145534/ [15:02:54] 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, 10Analytics-EventLogging, 13Patch-For-Review: Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#3084302 (10Ottomata) First create an eventlogging/scap/webperf repository in gerrit, and fill it with scap config... [15:06:47] Project beta-scap-eqiad build #145535: 04STILL FAILING in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145535/ [15:10:45] PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:13:02] PROBLEM - Puppet run on integration-publishing is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:23:47] oaehahz [15:24:02] zeljkof: so maybe I am hitting a new feature / bug from recent chromium versions bah [15:24:15] hm [15:24:18] hashar: could be [15:24:32] might have to pass --enable-automation [15:24:36] I usually use chrome, not chromium, so I might miss it [15:24:48] I don't know really. Was reading some random issue at https://bugs.chromium.org/p/chromedriver/issues/detail?id=1625 :) [15:25:59] zeljkof: is it chromedriver that spawn the browser or is it webdriver.io ? [15:26:32] chromedriver does the actual work, as far as I understand things [15:27:09] webdriverio (and other selenium bindings) just offer a standard api to talk to browsers, they talk to chromedriver/geckodriver and they drive the browsers [15:27:32] so webdriverio > chromedriver > chrome [15:28:02] RECOVERY - Puppet run on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [15:37:59] * hashar digs in https://sites.google.com/a/chromium.org/chromedriver/capabilities [15:39:30] zeljkof: eek have you changed your patch or just rebased it? [15:39:55] hashar: I have fixed a couple of minor eslint problems in my and your patch [15:40:00] everything works fine now [15:40:03] no major changes [15:40:54] ah ok [15:41:01] got a local patch ready that further enhance it [15:44:25] SOLVED [15:48:47] zeljkof: https://phabricator.wikimedia.org/P5027 [15:50:47] RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:52:08] hashar: I have just noticed a few eslint failures and fixed them [15:52:24] zeljkof: yeah it is ok [15:52:30] hashar: oh, cool command line flag [15:52:51] another thing I find annoying [15:52:59] is the test results are buffered [15:53:08] they are only shown once the spec file is complete [15:53:20] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #351: 04FAILURE in 31 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/351/ [15:53:28] and the window grabs the focus [15:55:48] zeljkof: one last question. you have added "wdio-dot-reporter" as a dependency [15:55:54] but it is already a dependency of webdriverio itself [15:56:08] so I guess we don't need to explicitly set it [15:56:20] hashar: it was probably added by webdriver itself to the package.json [15:56:36] wdio has an init script that does some magic [15:56:44] ok [15:56:44] if it works without it, feel free to remove it [15:56:45] not a big deal [16:00:21] I have removed it [16:00:24] else it get installed twice [16:00:39] the one form /package.json in /node_modules [16:00:55] and then the dependency of webdriverio somewhere like /node_modules/webdriverio/node_modules [16:07:47] RainbowSprinkles hi, why do emails get sent late with gerrit? It wasen't a problem with gerrit 2.12 but is with 2.13. [16:08:01] I notice that someone who did a c+2 at 12am my time, i got an email at 8am my time saying they did a c+2 [16:09:28] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#3084494 (10phuedx) [16:15:36] zeljkof: so far so good. Finally have the tests to pass :} [16:15:55] \o/ [16:20:54] 06Release-Engineering-Team, 10Wikidata, 07Story: [Story] Use composer-merge-plugin to include Wikidata components in mediawiki-vendor - https://phabricator.wikimedia.org/T95663#3084510 (10hoo) Any updates here @demon? Is there anything we can/ should do to get this going? [16:23:35] I am still not happy how we kill the chromedriver [16:26:54] zeljkof: I am crafting a nice commit message summary [16:27:09] hashar: great [16:27:17] ready for +2? ;) [16:33:09] no [16:52:43] stupid eslint [17:01:54] zeljkof: ok so PS 42 is my last iteration and I added some cover message explaining what I did in my last iteration. [17:02:03] so I guess we want some other to review / try it and that can land [17:02:15] the changes I did are https://gerrit.wikimedia.org/r/#/c/328191/40..41 [17:02:24] I am off, gotta head back home [17:03:24] will eventually rebase my script at some point [17:05:11] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 13Patch-For-Review, and 2 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3084606 (10hashar) [17:26:49] 10Gerrit, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084643 (10Paladox) [17:27:23] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084656 (10Paladox) [17:29:17] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084643 (10Paladox) For example this change https://gerrit.wikimedia.org/r/341630 it was merged at 12:05am my time but my email is showing as it being... [17:33:14] actually [17:47:30] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084643 (10Reedy) Might want to post the full email headers of the offending email(s) [17:50:29] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084721 (10Paladox) @Reedy hi, how can i do that please? [17:58:11] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084782 (10Paladox) @Reedy https://phabricator.wikimedia.org/P5028 [18:04:03] huh, looks like beta code update is stuck [18:04:09] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084794 (10Paladox) https://gerrit-review.googlesource.com/ is working in sending the email to my email at the correct times. [18:11:44] Yippee, build fixed! [18:11:45] Project beta-scap-eqiad build #145536: 09FIXED in 3 min 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145536/ [18:47:14] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084931 (10Paladox) gerrit test comment for emails (testing if it affects phabricator too) [18:47:24] Project beta-scap-eqiad build #145540: 04FAILURE in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145540/ [18:47:44] 10Gerrit, 06Release-Engineering-Team, 06Operations, 07Upstream: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084933 (10Paladox) Yep affects phabricator too. [18:49:40] 10Gerrit, 06Release-Engineering-Team, 06Operations, 10Phabricator, 07Upstream: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3084940 (10Paladox) [18:51:39] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3084643 (10Paladox) [18:56:12] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3084982 (10Paladox) I found this in the log on phabricator test instantance in labs /var/log/e... [18:56:42] Project beta-scap-eqiad build #145541: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145541/ [19:06:49] hey yall, if anybody gets a change, could you look into this? https://phabricator.wikimedia.org/T159960 [19:07:25] Yippee, build fixed! [19:07:26] Project beta-scap-eqiad build #145542: 09FIXED in 2 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145542/ [19:09:08] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085033 (10Ottomata) p:05Triage>03Low Hard to tell if this is a phab/gerrit problem, or jus... [19:28:11] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085148 (10demon) >>! In T159960#3084982, @Paladox wrote: > I found this in the log on phabrica... [19:28:59] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085150 (10Paladox) Oh that's an ipv6 address the ipv4 one works. [19:30:35] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085158 (10demon) Either way, we shouldn't be sending labs e-mails via the prod mailserver. I'm... [19:31:24] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085161 (10demon) Also: I'm not seeing any delay on getting Phabricator e-mails. For example: r... [19:32:12] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085163 (10demon) Same with gerrit, getting all my e-mails. I see no production problem on our end [19:34:02] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085192 (10demon) 05Open>03Invalid Last point: there's no back up of pending jobs on either... [19:34:15] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085194 (10Paladox) Oh, what about merging changes in gerrit? [19:35:41] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085199 (10Paladox) But how comes email's are showing up as late for me? gerri-review is workin... [19:36:42] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085206 (10demon) I'm pretty sure this is a local issue for you / your ISP, nobody else is seei... [19:36:59] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085207 (10Paladox) I did a comment on https://gerrit.wikimedia.org/r/#/c/340900/6 and am still... [19:37:30] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085209 (10Paladox) I use yahoo mail's web ui. [19:45:05] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085275 (10Paladox) I have now tested with my outlook account and same problem email taking too... [19:47:02] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085279 (10demon) Cannot replicate: I just confirmed a new e-mail within about 5 seconds. [19:48:40] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late also affecting other service like phabricator - https://phabricator.wikimedia.org/T159960#3085288 (10Paladox) Yes that worked too, but after confirming the email and writing a comment h... [19:57:17] (03PS1) 10Mholloway: Add install-android-sdk step to apps-android-wikipedia-lint job [integration/config] - 10https://gerrit.wikimedia.org/r/341843 (https://phabricator.wikimedia.org/T147099) [19:58:37] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations, and 2 others: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3085349 (10Paladox) [19:58:48] 10Gerrit, 06Release-Engineering-Team, 10Mail, 06Operations: Gerrit emails are showing up as being sent late - https://phabricator.wikimedia.org/T159960#3084643 (10Paladox) [20:10:15] strange running the exim4 command to send an email from phabricator [20:10:19] works on outlook [20:10:22] but not on yahoo [20:11:38] -> #wikimedia-devtools [20:11:53] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.29.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T158996#3085415 (10mmodell) [20:14:22] 10Continuous-Integration-Config, 10ContentTranslation, 03Language-2017 Sprint 2, 03Language-2017 Sprint 3, and 5 others: mwext-qunit-jessie test fails on unrelated change - https://phabricator.wikimedia.org/T153038#3085418 (10Nikerabbit) Thanks @hashar for resolving this! I wouldn't have been able to figur... [20:15:44] (03CR) 10Niedzielski: [V: 032 C: 032] Add install-android-sdk step to apps-android-wikipedia-lint job [integration/config] - 10https://gerrit.wikimedia.org/r/341843 (https://phabricator.wikimedia.org/T147099) (owner: 10Mholloway) [20:16:08] andre__ i was meaning the phabricator instance [20:16:17] not using the bin/mail programe to do it. [20:17:21] (03Merged) 10jenkins-bot: Add install-android-sdk step to apps-android-wikipedia-lint job [integration/config] - 10https://gerrit.wikimedia.org/r/341843 (https://phabricator.wikimedia.org/T147099) (owner: 10Mholloway) [20:18:58] I reported it at https://forums.yahoo.net/t5/Sending-and-receiving/Receiving-emails-from-some-of-wikipedia-s-domains-is-taking/td-p/203788 [20:22:20] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #352: 04FAILURE in 26 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/352/ [20:29:25] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:34:18] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 45865 bytes in 0.992 second response time [20:34:46] Hi, the debian glue test is not starting. Not sure if it is just it being slow to start up. See https://integration.wikimedia.org/zuul/ please [20:36:22] works now [20:41:24] 10Scap (Scap3-MediaWiki-MVP), 10releng-201516-q3, 03releng-201617-q4, 10scap2, and 2 others: [EPIC] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#3085501 (10demon) [20:41:27] 10Scap (Scap3-MediaWiki-MVP): Setup test environment for MediaWiki deployment - https://phabricator.wikimedia.org/T147940#3085498 (10demon) 05Open>03Resolved a:03demon We already did this, the scap-vagrant provisions MediaWiki (unconditionally now too) [20:43:07] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #327: 04FAILURE in 2 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/327/ [20:56:34] Project beta-scap-eqiad build #145553: 04FAILURE in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145553/ [20:57:56] "20:56:34 Check 'Logstash Error rate for deployment-mediawiki04.deployment-prep.eqiad.wmflabs' failed: ERROR: 68% OVER_THRESHOLD (Avg. Error rate: Before: 0.38, After: 12.00, Threshold: 3.82)" [21:09:18] Yippee, build fixed! [21:09:18] Project beta-scap-eqiad build #145554: 09FIXED in 4 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145554/ [21:23:09] 10Scap, 10scap2: scap to reload a service instead of restart - https://phabricator.wikimedia.org/T134001#3085597 (10demon) [21:33:12] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.29.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T158996#3085629 (10mmodell) [21:49:18] 10Deployment-Systems: [l10n] "Refreshing ResourceLoader caches" is really really slow - https://phabricator.wikimedia.org/T64705#3085676 (10Krinkle) 05Open>03Resolved a:03Krinkle Only takes 3-6 minutes instead of 20-60minutes as of last year. Related changes: * {be2656aa77b108ab53a260e4f9b79bbec8625862} *... [22:00:18] Project selenium-Core » firefox,beta,Linux,BrowserTests build #343: 04FAILURE in 8 min 17 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/343/ [22:01:23] Project beta-scap-eqiad build #145558: 04FAILURE in 2 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145558/ [22:04:27] 10Scap, 10Parsoid, 13Patch-For-Review: Check 'depool' exceeded 30.0s timeout - https://phabricator.wikimedia.org/T159387#3085737 (10Arlolra) 05Open>03Resolved a:03Arlolra Deploying went much smoother today. I assume someone will look at D588 eventually. [22:06:35] Project beta-scap-eqiad build #145559: 04STILL FAILING in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145559/ [22:06:53] 22:06:34 22:06:34 Check 'Logstash Error rate for deployment-mediawiki04.deployment-prep.eqiad.wmflabs' failed: ERROR: 69% OVER_THRESHOLD (Avg. Error rate: Before: 0.36, After: 12.00, Threshold: 3.62) [22:06:56] Wel, isn't that fun [22:09:06] hah [22:09:27] thought: should we have scap use --force when doing beta updates? [22:09:36] And break it completely? [22:09:37] :P [22:09:58] * greg-g shrugs [22:10:00] whatevs [22:10:04] :) [22:10:08] let's see if it's broken enough for eval.php [22:10:29] blerg. I think that this may be something with the new HHVM update, but I may be wrong about that. Haven't looked into it yet. It's been flapping. [22:12:14] I think it should dump some of the errors to the log though [22:14:59] Project beta-scap-eqiad build #145560: 04STILL FAILING in 0.36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145560/ [22:15:04] 10Scap: If aborting a scap due to test canary error rate, output some errors for reference - https://phabricator.wikimedia.org/T159991#3085762 (10Reedy) [22:16:43] What's the magic scap all the things command now? [22:16:50] FWIW this is what it's looking at: https://logstash-beta.wmflabs.org/goto/d8c5b0c89c745ddc75c95a63c105fc28 [22:16:54] scap sync [22:17:12] scap sync worked fine [22:17:18] 22:15:27 Finished scap: (no justification provided) (duration: 02m 55s) [22:17:52] RainbowSprinkles mutante i will need to change file permissions for the gerrit etc files. Since they were changed in https://github.com/wikimedia/puppet/commit/15bc9b80a7a91c54567d3fb63442ac13c955e603 [22:17:54] Core dumping and stack traces [22:17:58] That's pretty broken [22:18:05] thcipriani: Are ops aware? :) [22:18:09] I never caught the problem till now as installing gerrit is not working [22:18:15] as it keeps throwing permissions errors [22:18:19] yeah, I don't think, in this instance, it has much to do with a particular deployment, it's just new hhvm things. [22:18:22] but the file is owned by gerrit2 [22:18:28] Exception in thread "main" java.lang.RuntimeException: Cannot save secure.config [22:18:28] at com.google.gerrit.server.securestore.DefaultSecureStore.save(DefaultSecureStore.java:117) [22:18:35] That was by design. [22:18:40] Reedy: doubtful. Need to reply to https://phabricator.wikimedia.org/T158176 [22:18:41] I don't want init writing to the config files [22:18:59] Ok [22:19:06] How does one set up gerrit? [22:19:06] thcipriani: Though, Nikerabbit might have already seen it [22:19:16] i run [22:19:18] /usr/bin/java -jar gerrit.war init -d review_site [22:19:23] RainbowSprinkles ^^ [22:19:32] paladox: init shouldn't need to write to the files, it was all configured by puppet already [22:19:46] Oh puppet is failing [22:19:55] Error: Could not start Service[gerrit]: Execution of '/usr/sbin/service gerrit start' returned 1: Job for gerrit.service failed. See 'systemctl status gerrit.service' and 'journalctl -xn' for details. [22:19:55] Wrapped exception: [22:19:58] with ^^ [22:20:10] `java -jar gerrit.war init -d review_site --batch --no-auto-start [22:21:02] that fails [22:21:04] with the same error [22:21:32] Well, hack it locally and see what it's trying to change [22:21:33] Notice: /Stage[main]/Gerrit::Crons/File[/var/www/reviewer-counts.json]: Dependency Service[gerrit] has failures: true [22:21:46] ok [22:22:03] reviewer-counts thing is just cascading from prior failure, obviously [22:22:10] (plus I'm trying to kill that stupid cron) [22:22:23] :) [22:22:53] paladox: did you see the event in phab [22:22:58] Yep [22:26:13] thcipriani: https://github.com/wikimedia/puppet/blob/production/modules/hhvm/manifests/init.pp#L171 [22:26:30] Shall I create a patch in gerrit, and we can cherry pick onto beta puppetmaster for the time being? [22:26:36] Project beta-scap-eqiad build #145561: 04STILL FAILING in 1 min 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145561/ [22:27:08] RainbowSprinkles after several puppet agent -tv [22:27:13] and some java -jar [22:27:17] puppet works now [22:27:32] Reedy: yeah, seems like that's worth a shot. [22:27:41] Ops will likely go lolno for merging, which is fine [22:27:50] But they're not running it on production ;) [22:28:28] paladox: What was init trying to do/add that it couldn't? [22:28:28] oh boy. OK, lemme finish up what I'm doing, Then I'll update the phab ticket with what's happening. [22:28:38] Not sure. [22:28:48] it was trying to edit the secure.config [22:28:49] file [22:28:54] Running puppet should revert the change out and you'd see a diff.... [22:29:08] let me check [22:30:12] I have no diff for that, the only diff i have is for gerrit.config [22:30:22] [database] [22:30:22] + driver = org.mariadb.jdbc.Driver [22:30:22] type = mysql [22:30:22] hostname = 10.68.23.211 [22:30:22] database = reviewdb [22:30:23] username = gerrit [22:30:24] - url = jdbc:mysql://10.68.23.211/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci [22:30:25] + url = jdbc:mariadb://10.68.23.211/reviewdb?sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8mb4_unicode_ci,collation_database=utf8mb4_unicode_ci,collation_server=utf8mb4_unicode_ci [22:30:45] that's puppet as i am running the mariadb patch i created. [22:31:24] My *guess* is auth.registerEmailPrivateKey being auto-generated [22:31:31] Setting that in puppet should help [22:31:46] Yeh [22:31:54] Want me to set it in puppet? [22:32:10] Yeah, if you set it in the labs/private repo it'll auto-populate when puppet is run [22:32:16] And init won't try to generate one [22:32:27] (it's nbd if it's the same for labs installs really, they're just in testing) [22:32:44] Why is deployment-puppetmaster02:/var/lib/git/operations/puppet seemingly a few weeks out of date? [22:33:16] ok [22:33:28] or are they all cherry picks/live hacks.. [22:33:51] `git pull --rebase` and see what happens? [22:33:51] wat [22:33:52] :) [22:34:03] git should be pulling in new commits automatically [22:34:03] and/or what does `git status` say? [22:34:11] diverged or just local commits? [22:34:13] here [22:34:14] https://github.com/wikimedia/labs-private/blob/82202bd22acf2b45831bcec95bc434acb6215e36/modules/passwords/manifests/init.pp#L151 [22:34:15] and there are a lot of cherry-picks on top because beta [22:34:18] On branch production [22:34:18] Your branch and 'origin/production' have diverged, [22:34:18] and have 14 and 163 different commits each, respectively. [22:34:18] (use "git pull" to merge the remote branch into yours) [22:34:19] nothing to commit, working directory clean [22:34:37] Yeah I'd try a pull with a rebase so the local hacks end up back on top [22:34:38] ugh must be some rebase conflict [22:35:11] CONFLICT (content): Merge conflict in modules/mediawiki/templates/apache/sites/wwwportals.conf.erb [22:35:34] * Reedy "blames" Krenair [22:35:35] :P [22:35:52] paladox: Yep, exactly there...but looks like you've already got one [22:35:54] Hmmm [22:36:07] Yep [22:36:13] <<<<<<< HEAD [22:36:13] # ErrorDocument defined in sites-available/01-main.conf should be served directly and not redirected [22:36:14] ======= [22:36:14] >>>>>>> portals: do not rewrite 404 errors [22:36:17] Dear Git [22:36:23] WHY THE FUCK CAN'T YOU RESOLVE SIMPLE CONFLICTS [22:36:29] Love, [22:36:30] -Reedy [22:36:32] Cuz it's not simple ;-) [22:36:33] RainbowSprinkles should i make it configuable through hiera? [22:36:39] As i see a new command [22:36:43] https://gerrit.googlesource.com/gerrit/+/b10848ee832e081140351c991bc66025ade1878e%5E%21/#F0 [22:36:55] that will allow us to regenerate the password if we want [22:37:02] it's from gerrit 2.13.6+ [22:37:02] I seriously wouldn't be surprised if deployment-puppetmaster02's repository hadn't been updated since I lost access. [22:37:10] Well, it already is configurable via the private repo [22:37:47] Yippee, build fixed! [22:37:48] Project beta-scap-eqiad build #145562: 09FIXED in 2 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/145562/ [22:37:48] oh [22:38:15] It must not have been that piece...here's what I'd do: [22:38:22] 1) do a fresh provision, run puppet [22:38:25] 2) Disable puppet [22:38:31] 3) Enable writing for the config files [22:38:36] 4) Run init, let it write stuff [22:38:41] Fyi theres a meeting going on in -office [22:38:53] 5) See what changed between puppet & written copy (easiest by turning puppet back on and running, I guess) [22:38:56] root@deployment-puppetmaster02:/var/lib/git/operations/puppet# git rebase --continue [22:38:56] Applying: portals: do not rewrite 404 errors [22:38:56] No changes - did you forget to use 'git add'? [22:39:04] No, I already did git add [22:39:16] portals 404 stuff already went in upstream? [22:39:42] the patch it threw out... [22:39:42] + RewriteCond %{REQUEST_URI} !=/w/404.php [22:40:13] Which is at the same line as the one it added the comment [22:41:19] RainbowSprinkles https://gerrit.wikimedia.org/r/#/c/341918/ [22:41:37] upstream has both [22:41:41] Let's add + RewriteCond %{REQUEST_URI} !=/w/404.php in [22:42:14] so does this... [22:42:41] Your branch is ahead of 'origin/production' by 13 commits. [22:42:44] That sounds better [22:42:52] Muchhhh better [22:43:08] RainbowSprinkles woops just noticed that now. [22:43:13] * Reedy makes that 14 [22:43:29] If puppet already does it, im a little confussed now. [22:43:37] Like I said, follow my 5-step plan :) [22:43:47] It must be another variable that it's trying to write [22:44:05] Of course, scap fixed itself [22:44:09] But this may stop it flaking [22:44:43] lol ok [22:44:48] I will try that now [22:45:14] !log https://gerrit.wikimedia.org/r/#/c/341916/ cherry picked onto deployment-puppetmaster02 [22:45:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:46:40] RainbowSprinkles i will make the secure.config 0644 and run the jar command [22:46:41] now [22:49:57] -hhvm.server.stat_cache = true [22:49:57] +hhvm.server.stat_cache = false [22:49:58] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [22:50:32] lol [22:50:32] [its-phabricator] [22:50:32] - certificate = [22:50:32] + certificate = [22:50:39] RainbowSprinkles ^^ [22:50:47] thats why [22:52:07] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [22:52:30] Though how do we fix that one [22:52:34] is it a space? [22:53:26] Ah, ok I get it [22:53:39] Gerrit's init tends to want to trim spaces from config, so that's it [22:53:51] Easiest solution? Put a bogus value in labs/private for the cert [22:54:02] Some non-empty string will do it [22:54:27] Oh [22:54:29] yep [22:54:30] ok [22:54:39] Also: we can drop gerrit_bz_pass from labs/private, we don't use its-bugzilla anymore :) [22:55:01] :) [22:55:07] RainbowSprinkles something like $gerrit_phab_cert = '' [22:55:08] Same with gerrit_rest_token [22:55:13] two spaces in there [22:55:26] Nah, put something like 'bogusvalue' [22:55:32] So it's not just spaces that get trimmed [22:55:46] Ok [22:59:07] RainbowSprinkles done https://gerrit.wikimedia.org/r/#/c/341918/ [22:59:08] :) [23:03:55] Paladox you might be interested https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-03-08-22.03.html [23:04:52] Oh nope [23:12:22] RainbowSprinkles works now :) [23:27:08] RECOVERY - Puppet run on deployment-phab02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:29:56] RECOVERY - Puppet run on deployment-phab01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:30:34] hashar: whats on the agenda for tomorrow? [23:32:39] 06Release-Engineering-Team, 10Elasticsearch, 13Patch-For-Review, 10Phabricator (Search): phab+elasticsearch: support multiple elasticsearch clusters / datacenters - https://phabricator.wikimedia.org/T157156#3086019 (10mmodell)