[00:23:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms [00:24:26] PROBLEM - Puppet failure on wmfbranch is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [00:25:59] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [00:38:11] 10Deployment-Systems, 10Architecture, 10Wikimedia-Developer-Summit-2016-Organization, 7Availability: WikiDev 16 working area: Software engineering - https://phabricator.wikimedia.org/T119032#1886563 (10RobLa-WMF) One quick topic: the name "Software Engineering (aka Code Quality)" seems biased toward reduci... [00:53:33] 10Deployment-Systems, 10Architecture, 10Wikimedia-Developer-Summit-2016-Organization, 7Availability: WikiDev 16 working area: Software engineering - https://phabricator.wikimedia.org/T119032#1886611 (10RobLa-WMF) I'm going to attempt to summarize the must-haves captured in @Daniel's session description: *... [00:54:39] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [00:55:18] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [00:59:03] (03Abandoned) 10Paladox: Add new template mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/256238 (owner: 10Paladox) [01:03:22] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [01:05:54] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [01:47:11] Project beta-scap-eqiad build #82580: 04FAILURE in 3 min 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82580/ [01:56:54] Yippee, build fixed! [01:56:55] Project beta-scap-eqiad build #82581: 09FIXED in 7 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82581/ [04:56:03] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [04:56:04] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:05:15] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [05:05:50] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [05:33:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [05:35:51] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.31 ms [05:36:01] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [05:36:09] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [05:43:07] Yippee, build fixed! [05:43:08] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #287: 09FIXED in 27 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/287/ [06:27:08] Yippee, build fixed! [06:27:09] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #839: 09FIXED in 8 min 7 sec: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/839/ [06:56:45] PROBLEM - Host angry-caching-proxy is DOWN: CRITICAL - Host Unreachable (10.68.19.184) [07:00:57] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [07:03:50] 5Gerrit-Migration, 10Gitblit-Deprecate, 6Release-Engineering-Team, 3releng-201516-q3, and 5 others: [RfC]: Migrate code review / management to Phabricator from Gerrit - https://phabricator.wikimedia.org/T119908#1886903 (10RobLa-WMF) There is a proposal to discuss this at WikiDev '16: {T114320}. Please be... [07:11:02] PROBLEM - Puppet failure on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:11:55] 6Release-Engineering-Team, 3releng-201516-q3, 10Wikimedia-Developer-Summit-2016: Code-review migration to Differential status/discussion - https://phabricator.wikimedia.org/T114320#1886906 (10RobLa-WMF) This session currently isn't in consideration in the {T119032} area. @mmodell, @demon: does that seem cor... [07:27:06] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:28:01] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:36:02] RECOVERY - Puppet failure on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:07] RECOVERY - Puppet failure on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [08:02:01] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [08:03:02] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:49] 6Release-Engineering-Team, 3releng-201516-q3, 10Wikimedia-Developer-Summit-2016: Code-review migration to Differential status/discussion - https://phabricator.wikimedia.org/T114320#1886986 (10Qgil) For what is worth, is one of the only two must-haves defined at {T119030}. [08:06:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [08:09:23] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [08:21:05] Yippee, build fixed! [08:21:06] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #799: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/799/ [08:32:57] Yippee, build fixed! [08:32:58] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #816: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/816/ [08:47:25] 10Differential, 5Gerrit-Migration, 7Documentation: Document best practices to amend a change written by another contributor - https://phabricator.wikimedia.org/T121751#1887024 (10Dereckson) 3NEW [08:50:16] 6Release-Engineering-Team, 3releng-201516-q3, 10Wikimedia-Developer-Summit-2016: Code-review migration to Differential status/discussion - https://phabricator.wikimedia.org/T114320#1887030 (10RobLa-WMF) @qgil: I suppose that's true (if I'm reading {T119030} correctly), but you're currently proposing ceding t... [08:53:35] 10Differential, 5Gerrit-Migration: Define an equivalent to Gerrit's +-1 +-2 for code review evaluation - https://phabricator.wikimedia.org/T138#1887039 (10Dereckson) >"Request Changes" is like "-2". Well it's more like a -1 as it's the usual way to send back the patch to the author so it sees it on the UI as n... [09:24:58] 7Browser-Tests, 10MediaWiki-extensions-CentralAuth: Fix or delete failing browser tests Jenkins jobs for CentralAuth - https://phabricator.wikimedia.org/T121752#1887068 (10zeljkofilipin) 3NEW a:3zeljkofilipin [09:25:55] 7Browser-Tests, 10MediaWiki-extensions-CentralAuth: Fix or delete failing browser tests Jenkins jobs for CentralAuth - https://phabricator.wikimedia.org/T121752#1887077 (10zeljkofilipin) [09:27:56] 7Browser-Tests, 10CirrusSearch, 6Discovery, 7Ruby: Fix easy problems reported by RuboCop in CirrusSearch - https://phabricator.wikimedia.org/T117983#1887080 (10zeljkofilipin) a:3zeljkofilipin [09:35:26] zeljkof: so yeah lets just upgrade fix rubocop issues in a single change :-} [09:35:32] they are straightforward to review [09:35:43] and I will handle rebases / merge conflicts [09:35:57] a good way for me to get a bit more familiar with the ruby code there [09:45:22] hashar: have to finish something first, will send the patch then [09:46:03] zeljkof: poke me here whenever ready :-} [09:47:06] will do [09:51:32] playing with Jenkins matrix job [09:51:39] https://integration.wikimedia.org/ci/job/hashar-perf-matrix/ <-- neat [10:34:36] hashar: https://gerrit.wikimedia.org/r/#/c/259650/ [10:34:55] you were correct, looks like cirrus search only had a few minor offenses :D [10:38:33] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [10:40:26] phedenskog: you have noticed. I talk too much :-) [10:40:51] zeljkof: yeah usually not worth pilling up a bunch of tiny changes. Going to review it in a few [10:41:23] hashar: I am running the tests locally, to check if I broke anything [10:50:22] hashar: phantomjs broken on mac el capitan :/ https://github.com/ariya/phantomjs/issues/12928 [10:50:48] great [10:51:09] oh %r{} [10:51:22] yeah that make sense [10:51:37] 6Release-Engineering-Team, 3releng-201516-q3, 10Wikimedia-Developer-Summit-2016: Code-review migration to Differential status/discussion - https://phabricator.wikimedia.org/T114320#1887189 (10Qgil) Both code review sessions are obviously related, but each topic is big enough to fill its own slot. Now that... [10:53:39] zeljkof: what is your check result ? [10:53:42] zeljkof: cause the change looks +2 to me [10:53:59] just managed to install phantomsj :/ [10:54:06] started the run [10:54:22] but it is failing a lot :( [10:54:28] well [10:54:32] the only job we have is Firefox https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/ [10:54:39] so might want to use firefox instead? [10:54:55] do not merge yet, will let you know when I rerun the tests [10:56:02] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:58:23] zeljkof: computer froze :( [10:58:34] hashar: ouch [10:59:55] looks like my patch did not break the tests, they fail a lot on master too [11:00:02] investigating [11:00:08] Scenario: Search with accent yields result page with accent # features/smoke.feature:30 [11:00:08] undefined method `request_uri' for # (NoMethodError) [11:00:10] with firefox hehe [11:02:03] oh [11:04:22] all fixed [11:04:33] used 1.6.3 env variables [11:04:36] but HEAD still uses 0.3.0 something [11:04:37] :-D [11:06:21] Project beta-scap-eqiad build #82629: 04FAILURE in 2 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82629/ [11:10:30] hashar: 79/577 failed for master :( [11:10:48] not sure what is going on, maybe my vagrant vm is broken, recreating and rerunning [11:11:09] the last time I tried (a few days ago) only 5-10 scenarios failed for master [11:12:33] maybe vagrant is crazy [11:12:48] beta-scap broken green [11:13:10] !log beta-scap-eqiad broken ( rsync: rename failed for "/srv/mediawiki/php-master/cache/gitinfo/info-extensions-AJAXPoll.json" (from php-master/cache/gitinfo/.~tmp~/info-extensions-AJAXPoll.json): No such file or directory (2) ) [11:13:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:14:23] !log beta-scap chokes on Copying to deployment-bastion.deployment-prep.eqiad.wmflabs from deployment-bastion.eqiad.wmflabs | Started rsync common [11:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:27:06] 5Testing-Initiative-2015, 10Browser-Tests-Infrastructure, 7JavaScript, 5Patch-For-Review: Experiment with browser testing in other software languages - https://phabricator.wikimedia.org/T108874#1887233 (10zeljkofilipin) [11:31:06] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [11:38:11] hashar: my vm is rebuilt, but it still fails a lot :( [11:38:23] I mean, tests still fail a lot (on master) [11:38:42] not sure what happened to cirrus recently, will check commit log [11:39:52] dcausse: did you notice cirrussearch browser tests started to fail way more recently, in the last few days? [11:52:35] hashar: not sure what is wrong with cirrus, the new vm behaves the same (fails 87/577 scenarios) :( [11:52:37] on master [11:52:46] running the tests for my patch now [11:53:01] but the patch is so small, there is almost no chance that it broke something [11:53:20] will let you know in 10 minutes or so when the tests finish [11:53:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [11:55:56] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [11:59:25] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887303 (10zeljkofilipin) [12:03:47] Yippee, build fixed! [12:03:47] Project beta-scap-eqiad build #82631: 09FIXED in 7 min 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82631/ [12:04:33] !log beta-scap fixed all by itself [12:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:05:28] 10Continuous-Integration-Config, 10Fundraising-Backlog: Continuous integration: wikimedia/fundraising/tools/DjangoBannerStats needs V+2 jobs - https://phabricator.wikimedia.org/T121723#1887317 (10hashar) Looks like a django plugin, so we could bring in a specific Django version in the test environment and appa... [12:05:45] 10Continuous-Integration-Config, 10Fundraising-Backlog: Continuous integration: wikimedia/fundraising/tools/DjangoBannerStats needs V+2 jobs - https://phabricator.wikimedia.org/T121723#1887319 (10hashar) p:5Triage>3Normal [12:06:34] hashar: maybe cirrus needs to warm up ;) [12:07:49] !log doing cleanup maintenance on deployment-bastion git repo under /srv/mediawiki-staging : git remote update --prune ; git gc ; git pack-refs [12:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:08:13] zeljkof: maybe. I have no idea how vagrant / cirrus works really [12:16:48] !log beta: salt -v '*' cmd.run 'apt-get clean' [12:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:20:35] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [12:21:53] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [12:25:46] Project beta-scap-eqiad build #82633: 04FAILURE in 7 min 20 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82633/ [12:27:03] PROBLEM - Puppet failure on integration-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:28:58] RECOVERY - Host angry-caching-proxy is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [12:35:29] Yippee, build fixed! [12:35:30] Project beta-scap-eqiad build #82634: 09FIXED in 7 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82634/ [12:53:48] Project beta-scap-eqiad build #82636: 04FAILURE in 7 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82636/ [12:55:35] Yippee, build fixed! [12:55:36] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #692: 09FIXED in 1 min 34 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/692/ [12:56:01] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [12:59:23] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [12:59:24] 10Deployment-Systems, 10Architecture, 10Wikimedia-Developer-Summit-2016-Organization, 7Availability: WikiDev 16 working area: Software engineering - https://phabricator.wikimedia.org/T119032#1887377 (10daniel) >>! In T119032#1886563, @RobLa-WMF wrote: > One quick topic: the name "Software Engineering (aka... [13:01:39] 10Continuous-Integration-Infrastructure, 7Upstream, 7Zuul: Zuul Status API cached too long by Varnish - https://phabricator.wikimedia.org/T94796#1887379 (10hashar) 5stalled>3Resolved The upstream patch https://review.openstack.org/#/c/170081/ is incorporated in the Zuul server we run. I did the update ro... [13:02:04] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [13:03:20] Yippee, build fixed! [13:03:20] Project beta-scap-eqiad build #82637: 09FIXED in 7 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82637/ [13:09:26] PROBLEM - Host angry-caching-proxy is DOWN: CRITICAL - Host Unreachable (10.68.19.184) [13:14:40] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [13:15:57] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms [13:19:11] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887384 (10zeljkofilipin) [13:21:08] zeljkof: we have some tests that are failing randomly but I didn't notice something particular recently [13:21:30] we've moarked most of them with @expect_failure and cindy ignores them [13:21:42] dcausse: sorry, looks like there was something wrong with my vagrant vm [13:21:50] ah ok :) [13:21:57] number of failures went down to 7-8 after a few test runs [13:22:02] not sure why [13:22:22] but the first test run after a clean vm was created failed with 70-80 failed scenarios :/ [13:22:51] the first run is usually messy because indices need to be created and I guess some tests won't work properly with a fresh index [13:23:20] that will be a problem when we start to run the tests in CI [13:23:27] :/ [13:23:33] but we can think about it when we get there [13:23:48] for now I just need to update the repo to mediawiki_selenium 1.6 [13:23:54] I think some tests uses documents that are created in another test "before hook" [13:24:38] ok, we will have to review all these tests, but it's not a bad thing :) [13:25:08] started working on it https://gerrit.wikimedia.org/r/#/c/257928/ [13:25:31] wow cindy is happy, congrats :) [13:25:41] but I have to finish https://phabricator.wikimedia.org/T114241 first [13:25:57] sure, let me know if you need to merge something [13:26:19] dcausse: well, this one is ready https://gerrit.wikimedia.org/r/#/c/259650/ [13:26:39] ok will +2 then [13:27:28] thanks [13:27:41] it is a small commit, unlike upgrade to 1.6 [13:28:03] I have tested it locally, the same number of failures as for master [13:29:21] usually you can use cindy to check, if cindy complains about frozen_index_api or update_weight, simply remove and re-add it from reviewers, it's like a "recheck" [13:29:57] where can I see Cindy running? [13:30:11] cirrus-browser-bot.search.eqiad.wmflabs [13:30:42] let me see if I can add you to the lab project [13:31:06] dcausse: hello. If we could migrate Cindy to the CI nodes that would be rather nice :-} [13:31:22] exactly my thoughts :) [13:31:36] like it was done for Barry [13:31:47] Barry was the equivalent for mobile wasn't it? [13:31:51] hashar: ah I don't know how to do but I can have a look [13:32:08] perfect topic for a pairing session! :) [13:32:16] or even a group sprint :D [13:32:20] :) [13:32:29] if you end up pairing, please do invite me too, I would like to see how it is done too [13:32:51] an outcome would be ElasticSearch / CirrusSearch folks would get to know the CI infra and potentially add moaaaaaar jobs and tests [13:33:19] yes :) [13:34:04] we will have to consolidate our tests at least make the first run to pass, because it will be always a first run in the CI infra right? [13:34:28] not sure how it is set up now, but in future yes [13:34:39] I think we do not destroy the instance now [13:35:02] well today we need sometimes to log in and cleanup some indices [13:40:24] oh [13:41:25] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887428 (10zeljkofilipin) [13:41:41] (03PS1) 10Hashar: Actually add CirrusSearch to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259679 [13:42:10] !log Added CirrusSearch to extension-gate https://gerrit.wikimedia.org/r/259679 [13:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:42:51] guess who is going to add yet another test to ensure jjb/zuul are in sync? :D [13:45:26] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [13:46:01] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [13:46:08] !log Build example pass ( https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/43270/ https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/23552/ ) [13:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:46:52] (03CR) 10Hashar: [C: 032] "Validated on https://gerrit.wikimedia.org/r/#/c/70205/ , PASS" [integration/config] - 10https://gerrit.wikimedia.org/r/259679 (owner: 10Hashar) [13:48:47] (03Merged) 10jenkins-bot: Actually add CirrusSearch to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259679 (owner: 10Hashar) [13:53:46] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887454 (10zeljkofilipin) [13:58:21] hashar: that was perfect, thanks again :) [13:58:42] phedenskog: glad you liked it :} [13:59:40] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [14:00:07] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887459 (10zeljkofilipin) [14:00:51] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 2.55 ms [14:08:01] 6Release-Engineering-Team, 7Ruby, 7Tracking: Fix easy problems reported by RuboCop - https://phabricator.wikimedia.org/T91485#1887478 (10zeljkofilipin) [14:08:02] 7Browser-Tests, 10CirrusSearch, 6Discovery, 5Patch-For-Review, and 2 others: Fix easy problems reported by RuboCop in CirrusSearch - https://phabricator.wikimedia.org/T117983#1887477 (10zeljkofilipin) 5Open>3Resolved [14:10:35] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887480 (10zeljkofilipin) [14:15:09] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [14:15:26] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [14:15:52] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887501 (10zeljkofilipin) [14:22:54] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1887512 (10zeljkofilipin) [14:23:51] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1689201 (10zeljkofilipin) [14:24:08] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security, 5MW-1.27-release-notes, and 2 others: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1689201 (10zeljkofilipin) [14:26:34] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms [14:27:36] 10Differential, 5Gerrit-Migration: Define an equivalent to Gerrit's +-1 +-2 for code review evaluation - https://phabricator.wikimedia.org/T138#1887533 (10Luke081515) Maybe a problem in case of "request changes": Everyone who can edit the revision can remove the reviewer (who request changes). [14:28:22] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [15:03:47] thcipriani: good morning [15:03:55] r u running swat today? [15:04:26] mobrovac: likely, what's up? [15:05:09] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [15:05:51] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [15:09:20] thcipriani: i have some patches for ext/Math which are already on 1.27.0-wmf9, but would need to get them on all groups in prod [15:09:24] would that be possible? [15:09:40] (earlier branches don't have these) [15:09:58] you just need to backport to 1.27.0-wmf.8? [15:10:56] that's the only other group? [15:11:04] aren't there 3 groups usually? [15:11:21] but yes, basically i'd need to backport and deploy to all of the others [15:11:56] (03PS1) 10Hashar: Assert gate jobs are properly configured [integration/config] - 10https://gerrit.wikimedia.org/r/259711 [15:12:02] 6Release-Engineering-Team, 3releng-201516-q3, 10Wikimedia-Developer-Summit-2016: Code-review migration to Differential status/discussion - https://phabricator.wikimedia.org/T114320#1887625 (10demon) I don't really care which track this ends up in. I figure it'll have a huge overlap with {T114419} anyway. Th... [15:12:03] also, is it ok to just merge --squash the patches into one? [15:12:30] (03CR) 10Hashar: [C: 04-1] "WIP" [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [15:12:41] yeah, currently group0 is on .9 other groups on are on .8. Sure, that should be no problem. Just cherry pick changes to the .8 branch of Math. Fine with me if they all go out at once if you're fine with it. Do these changes have l10n updates? [15:12:54] (03CR) 10jenkins-bot: [V: 04-1] Assert gate jobs are properly configured [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [15:14:01] thcipriani: the commits i need themselves no, should they? [15:14:12] if so, i can simply merge --squash master ? [15:15:14] (03PS1) 10Hashar: [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 [15:15:52] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [15:16:20] (03PS1) 10Hashar: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 [15:16:30] mobrovac: I'm fine with merge --squash. I was curious about l10nupdates since that requires a full scap and takes a long time and I like to avoid it during SWAT when possible. [15:16:34] (03CR) 10jenkins-bot: [V: 04-1] [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 (owner: 10Hashar) [15:16:45] yuouuuu 1h30 to write a test [15:16:46] :( [15:17:23] (03CR) 10Hashar: [C: 04-2] "Need to triple check the jobs work with PdfHandler added as a dependency." [integration/config] - 10https://gerrit.wikimedia.org/r/259714 (owner: 10Hashar) [15:18:04] thcipriani: i see, ok will jsut cherry-pick the commits i need and squash them into one patch for wmf8 [15:18:15] * mobrovac is a fan of the KISS principle :) [15:18:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [15:18:23] mobrovac: sounds good. Thanks :) [15:18:23] (03CR) 10jenkins-bot: [V: 04-1] [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 (owner: 10Hashar) [15:20:40] (03PS2) 10Hashar: Assert gate jobs are properly configured [integration/config] - 10https://gerrit.wikimedia.org/r/259711 [15:21:11] (03CR) 10Hashar: "Hopefully fixed flake8" [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [15:21:17] (03PS2) 10Hashar: [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 [15:21:23] (03PS2) 10Hashar: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 [15:22:05] (03CR) 10jenkins-bot: [V: 04-1] Assert gate jobs are properly configured [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [15:22:45] (03CR) 10jenkins-bot: [V: 04-1] [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 (owner: 10Hashar) [15:26:23] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [15:29:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [15:35:51] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [15:38:22] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [15:53:26] hey hey! just did a successful scap deploy of eventlogging in beta woo! [15:53:49] ottomata: \o/ that's awesome! [16:01:34] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:02:19] ottomata: thcipriani: congrats! [16:03:28] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:06:58] (03PS3) 10Hashar: [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 [16:10:14] (03CR) 10Hashar: [C: 032] [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 (owner: 10Hashar) [16:10:49] (03CR) 10Hashar: "Buggy no more trigger the jobs. It is not WMF deployed and not defined in JJB https://gerrit.wikimedia.org/r/#/c/259713/" [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [16:11:29] (03Merged) 10jenkins-bot: [Buggy] remove from extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259713 (owner: 10Hashar) [16:14:09] thcipriani: i have another scap::target patch coming your way...i want to turn it into a define to make it easier for folks to do your Class mockbase::deploy_target stuff [16:14:37] ottomata: awesome. Thanks so much for all your feedback on this—very helpful! [16:14:41] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [16:14:51] (03PS3) 10Hashar: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 [16:15:33] gonna be like [16:15:42] scap::target { 'mockbase': [16:15:42] deploy_user => 'deploy-mockbase', [16:15:42] public_key_source => 'puppet://modules/mockbase/deploy-test_rsa.pub' [16:15:42] } [16:15:51] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [16:16:26] (03PS1) 10Hashar: [PdfHandler] gate jobs in experimental for testing [integration/config] - 10https://gerrit.wikimedia.org/r/259730 [16:16:38] (03PS4) 10Hashar: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 [16:19:02] (03CR) 10Hashar: [C: 032] [PdfHandler] gate jobs in experimental for testing [integration/config] - 10https://gerrit.wikimedia.org/r/259730 (owner: 10Hashar) [16:20:20] thcipriani: https://gerrit.wikimedia.org/r/#/c/259542/2/modules/scap/manifests/target.pp [16:20:27] (03Merged) 10jenkins-bot: [PdfHandler] gate jobs in experimental for testing [integration/config] - 10https://gerrit.wikimedia.org/r/259730 (owner: 10Hashar) [16:25:10] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:25:57] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:28:06] thcipriani: does the deploy user really need to have sudo -u self permissions? [16:28:16] could scap be smarter about that, and just not sudo if sudoing to self? [16:29:06] ottomata: indeed. We recently did some work with that end-goal in mind. [16:29:14] ok awesome [16:29:21] will add a todo to remove that puppet stuff when it is no longer neede dthen [16:29:31] cool, thanks. [16:33:23] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms [16:35:50] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [16:39:05] 7Browser-Tests, 10VisualEditor, 5Patch-For-Review: Delete or fix failed VisualEditor browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94162#1887752 (10zeljkofilipin) a:5zeljkofilipin>3None [16:44:40] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:45:01] (03PS5) 10Hashar: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 [16:45:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:50:44] (03CR) 10Hashar: [C: 032] "job deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/259714 (owner: 10Hashar) [16:51:03] !log Added PdfHandler to extension-gate https://gerrit.wikimedia.org/r/#/c/259714/ [16:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:52:42] (03Merged) 10jenkins-bot: [PdfHandler] add to extension-gate [integration/config] - 10https://gerrit.wikimedia.org/r/259714 (owner: 10Hashar) [16:53:27] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [16:54:50] (03CR) 10Hashar: "PdfHandler is now a dependency in the JJB job ( https://gerrit.wikimedia.org/r/259714 )" [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [16:54:57] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/259711 (owner: 10Hashar) [16:55:52] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [16:57:24] PROBLEM - Puppet failure on wmfbranch is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [16:58:03] hmm, thcipriani [16:58:10] CalledProcessError: Command 'sudo /usr/sbin/service eventlogging-service-eventbus restart' returned non-zero exit status 1 [16:58:13] its trying to do a general sudo there [16:58:21] i guess that needs special sudo rules? [16:58:27] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [16:58:35] indeed that does for service restart. [16:58:45] hm [16:59:48] is that doable with sudo? [16:59:54] i can say it can run /usr/sbin/service [17:00:04] but i'm not sure i can say it can run only /usr/sbin/service eventlogging-service-eventbus [17:00:06] with an arg [17:00:25] Project browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T121752 build #1: 04FAILURE in 42 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T121752/1/ [17:00:26] oh maybe it can [17:00:27] .. [17:00:27] hm [17:01:55] ottomata: yeah, that should be possible. so eventlogging user needs: 'ALL = (root) NOPASSWD: /usr/sbin/service eventlogging-service-eventbus *' [17:02:07] aye ok yeah, just reading, oh with the *? [17:02:07] ok [17:02:12] yeah [17:03:22] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.53 ms [17:08:44] 7Browser-Tests, 10MediaWiki-extensions-CentralAuth: Fix or delete failing browser tests Jenkins jobs for CentralAuth - https://phabricator.wikimedia.org/T121752#1887819 (10zeljkofilipin) [17:09:04] 7Browser-Tests, 10MediaWiki-extensions-CentralAuth: Fix or delete failing browser tests Jenkins jobs for CentralAuth - https://phabricator.wikimedia.org/T121752#1887821 (10zeljkofilipin) a:5zeljkofilipin>3None [17:11:08] I am off wave [17:25:10] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [17:26:22] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [17:42:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [17:43:02] 7Browser-Tests, 10MediaWiki-extensions-CentralAuth: Fix or delete failing browser tests Jenkins jobs for CentralAuth - https://phabricator.wikimedia.org/T121752#1887939 (10Legoktm) p:5Normal>3High [17:43:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.66 ms [17:50:09] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [17:51:02] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [17:55:12] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [17:55:54] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [18:14:20] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [18:14:23] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [18:17:27] thcipriani: do you think it would be possible to review so I can merge this today? i'd like to keep moving with my eventbus stuff, and I want to use this [18:17:33] https://gerrit.wikimedia.org/r/#/c/259542/ [18:18:34] it shouldn't affect any mw stuff [18:18:40] ottomata: I can give a review today, I don't have ops +2 though. I can try to help move it along though. [18:18:49] i have +2 :) [18:19:20] ah, gotcha :) [18:30:59] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [18:33:21] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [18:48:51] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:57:06] (03CR) 1020after4: [C: 032] Add flag to resume branching from named extension [tools/release] - 10https://gerrit.wikimedia.org/r/258203 (owner: 10Thcipriani) [18:59:49] (03CR) 1020after4: "phabricator/phabricator is upstream code I'm not sure we should be trying to enforce lint rules on their codebase." [integration/config] - 10https://gerrit.wikimedia.org/r/258534 (owner: 10Paladox) [19:01:40] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [19:03:54] (03Merged) 10jenkins-bot: Add flag to resume branching from named extension [tools/release] - 10https://gerrit.wikimedia.org/r/258203 (owner: 10Thcipriani) [19:04:40] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [19:25:22] thcipriani: thanks for comments! [19:25:30] the reason you can't do ${deploy_cache}-cache [19:25:33] sorry [19:25:36] deploy_path* [19:25:38] is because [19:25:56] in my case, $deploy_path is /srv/deployment/eventlogging/eventbus [19:26:17] what creates the /srv/deployment/eventlogging directory? [19:28:20] ottomata: ah, right, I ran into this with a patch I had for restbase. Recursive directories and puppet—never something I remember without trying it once. :) [19:29:21] kk, lemme amend my comment. I also wanted to run the puppet compiler on it real quickly. [19:29:50] k [19:31:19] i mean thciprianii don't like that $parent_dir thing either [19:31:30] i think yall should move the -cache dir :D [19:31:53] :) [19:37:31] I'll add a comment explaining why in that patch [19:38:39] ottomata: cool, puppetcompiler was a noop, so that's good :) [19:39:26] k cool [19:39:27] danke [19:43:33] (03PS1) 10Phedenskog: Reimpl for catching errors in WebPageTest runs [integration/config] - 10https://gerrit.wikimedia.org/r/259783 (https://phabricator.wikimedia.org/T120365) [19:44:11] twentyafterfour: Would this be the problem code https://github.com/phacility/phabricator/blob/24845c70b918789be5309f88ed3f6455f5f29748/src/applications/diffusion/engine/DiffusionCommitHookEngine.php#L407 that isen't letting us view open patches. [19:45:31] twentyafterfour: Also do you know when the branch script would be created to redirect any branches going to refs/heads/master and redirect to master or refs/meta/ to meta i think. Please see https://phabricator.wikimedia.org/T121374 [19:46:23] (03CR) 10Phedenskog: "Let me test deploying this one when I have my environment up and running :)" [integration/config] - 10https://gerrit.wikimedia.org/r/259783 (https://phabricator.wikimedia.org/T120365) (owner: 10Phedenskog) [19:46:28] paladox: I'm not sure [19:46:48] twentyafterfour: Ok. [19:54:42] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.95 ms [19:55:53] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.35 ms [20:09:18] twentyafterfour: I think it this file https://github.com/phacility/phabricator/blob/7b997359466f08dcee82f9942534b7a1eb31d18c/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php#L316 since the name of the file suggests it is pulling. [20:11:34] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [20:11:34] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [20:13:20] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [20:15:50] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [20:24:49] PROBLEM - Puppet failure on deployment-eventlogging03 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [20:29:33] (03CR) 10Krinkle: "Seems to work fine?" [integration/config] - 10https://gerrit.wikimedia.org/r/259783 (https://phabricator.wikimedia.org/T120365) (owner: 10Phedenskog) [20:33:08] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888526 (10Dzahn) 3NEW [20:33:46] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888544 (10Dzahn) [20:34:35] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888526 (10Dzahn) [20:36:16] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888560 (10Dzahn) p:5Triage>3Low [21:06:39] RECOVERY - Puppet failure on mira is OK: OK: Less than 1.00% above the threshold [0.0] [21:09:47] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888624 (10hashar) + my favorites rubyist It can be either: - a weird bug in puppet-lint or a strange oddity on the CI slaves - or instances have a dirty workspac... [21:12:51] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888637 (10hashar) Looked at the slaves with `git status` and the workspaces are clean. Command on integration-saltmaster: `salt '*slave-trusty*' cmd.run 'git -C /... [21:18:52] (03CR) 10Phedenskog: "It works locally when I test but when I run it in Jenkins I get something like:" [integration/config] - 10https://gerrit.wikimedia.org/r/259783 (https://phabricator.wikimedia.org/T120365) (owner: 10Phedenskog) [21:26:16] PROBLEM - Puppet failure on deployment-kafka04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:31:37] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [21:32:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [21:34:20] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [0.0] [21:35:09] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Zuul: Zuul-cloner should use hard links when fetching from cache-dir - https://phabricator.wikimedia.org/T97106#1888684 (10Dzahn) [21:36:38] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Zuul: Zuul-cloner should use hard links when fetching from cache-dir - https://phabricator.wikimedia.org/T97106#1888686 (10hashar) T119714 got us zuul_2.1.0-60-g1cc37f7-wmf4 everywhere. So we should now be able to use hard-links. [21:37:31] hmm, hey thcipriani [21:37:35] i think [21:37:35] file { '/srv/deployment/mockbase': [21:37:35] ensure => directory, [21:37:35] owner => $user, [21:37:35] mode => '0755' [21:37:35] recurse => true, [21:37:35] } [21:37:35] is a problem [21:37:45] because scap makes that into a symlink [21:37:48] and then puppet runs [21:37:56] and deletes the symlink by ensuring it is a directory [21:38:17] blerg. [21:38:52] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888689 (10Dzahn) i did these things on my laptop. so CI slaves would be unrelated. it was just the Debian package when i run it locally on my clone of the ops/pup... [21:39:22] doing this [21:39:23] https://gerrit.wikimedia.org/r/#/c/259802/ [21:39:26] !log refreshing nodepool snapshot to get the latest zuul right now ( https://wikitech.wikimedia.org/wiki/Nodepool#Manually_generate_a_new_snapshot ) [21:39:28] but i don't know if it is the right thing to do [21:39:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:40:55] ottomata: yeah, I'm not sure it's the right thing either...seems better than broken, though. [21:40:57] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888690 (10hashar) Do you have a way to reliably reproduce the issue? Or at least some Jenkins builds showing it? That would help. [21:41:01] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [21:43:23] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms [21:48:04] (03CR) 10Hashar: [C: 04-1] "/bin/sh is dash on Trusty and it doesn't seem to recognize RESULT+=$BAR" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/259783 (https://phabricator.wikimedia.org/T120365) (owner: 10Phedenskog) [21:48:20] !log Image ci-jessie-wikimedia-1450388384 in wmflabs-eqiad is ready [21:48:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:48:58] ottomata: thcipriani: ensure present doesn't sound optimal [21:49:24] if puppet is run before the deployment just a regular empty file will be created [21:49:25] twentyafterfour: Could you review https://gerrit.wikimedia.org/r/#/c/259834/ i am not sure if i broke it or that way works. [21:49:33] !log milestone, Nodepool has spawned 14500 instances so far [21:49:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:50:25] mobrovac: either way, i think scap3 will replace the empty file with the symlink during deployment [21:50:39] mobrovac: this is true. In that instance scap should be able to run: ln -sf on that file. [21:51:49] one alternative here is that if we enforced the arbitrary two-directory repo rule. [21:51:51] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888734 (10Dzahn) hmm.. i guess that would be: install Debian strech apt-get install puppet-lint git clone https://gerrit.wikimedia.org/r/p/operations.git grep "80... [21:52:32] perhaps just an exec unless would be in order to change the ownership? [21:52:50] euh that would need two execs actually [21:52:51] hm [21:53:22] but wait, isn't it just important for scap that the parent dir exists and that scap can mangle files inside it? [21:54:35] really just the [repo]-cache dir has to exist for mangling files. The final location is created by scap, but it can only be created if the location is writable by the remote user. [21:58:48] it seems to me that the path of least resistance might be to continue to enforce a repo that consists of two directories. marxarelli and I talked through this quite a lot when we made the decision to try to not enforce that anymore. Not sure I ever came down squarely on either side of the fence. [22:00:40] the ensure => present solution also works, but it's less intuitive, a little strange. [22:00:49] yup [22:03:15] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888776 (10hashar) Well with puppet.git at f387ac8a857bd8a0e6ad0a3daffa7a1bd5ea3958 ``` $ cd manifests/role $ puppet-lint *.pp|wc -l 927 $ puppet-lint *.pp|g... [22:03:26] thcipriani: maybe just make git_deploy_dir absolute [22:03:31] and don't concatenate it with git_repo [22:03:46] then each scap.cfg file can specifc the location where it expects to deploy [22:03:59] and puppet can be configured to set it up properly? [22:04:22] git_repo should just be used for cloning [22:04:23] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [22:04:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [22:04:27] !log Nodepool: openstack image delete ci-jessie-wikimedia_old_20151210 [22:04:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:04:34] the path at which the clone happens could be completely different [22:05:07] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888796 (10Dzahn) @hashar oh, but how come this doesn't seem to be the case for options other than the 80char check? [22:07:27] ottomata: that's probably a good idea. The git_deploy_dir and git_repo thing were a kind of trebuchet idea that ended up ported over. [22:09:28] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888836 (10hashar) if i cd manifests/role puppet-lint complains with a lot of different errors ( 927 ) and 246 of those 927 errors are 'than 80'. i.e. .puppet-... [22:12:12] ottomata: something just occurred to me about scap::target. would doing: scap::target{ 'mockbase': ... } mean that puppet tries to make /srv/deployment owned by $deploy_user ? [22:12:35] HmmMm yeah it would, but only if not defined...which is not very reliable in puppet [22:12:58] in this case it will probably be safe, but we shouldn't rely on it :/ [22:13:20] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 1.39 ms [22:15:08] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888867 (10Dzahn) if i do it like this, and avoid changing directory, it is working indeed: `for manifest in $(find . -name *.pp); do echo $manifest; puppet-lint $... [22:15:57] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [22:18:51] 10Continuous-Integration-Config, 6operations, 7Puppet: puppet-lint ignores --no-80chars-check option - https://phabricator.wikimedia.org/T121796#1888904 (10Dzahn) 5Open>3Invalid a:3Dzahn [22:19:56] !log Nodepool: all nodes are on ci-jessie-wikimedia-1450388384 [22:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:24:58] hm ,thcipriani i a super close to deploying in prod now [22:25:05] submodules aren't working though [22:25:10] i didn't have any problems with them in labs [22:25:21] fatal: repository 'http://tin.eqiad.wmnet/eventlogging/eventbus/.git/modules/config/schemas/' not found [22:25:52] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [22:25:58] ottomata: hmm do you have submodules checked out on tin? [22:26:19] yes, hmm, but, the name of the repo is differen than the path at which it is checked out? [22:26:29] .git/modules/config/event-schemas exists there [22:26:29] hm [22:26:32] investigating... [22:26:44] huh [22:26:49] its different on labs.>..> [22:26:49] hmmm [22:27:32] ok this looks like it much be somehing weird with my repo [22:27:35] carry on, i'll figure it out [22:27:54] hmmm, i think maybe labs was out of data [22:27:55] hmmmm [22:28:07] yeah, thcipriani, this is normal [22:28:08] ok [22:28:13] i think this is a scap bug [22:28:23] [submodule "config/event-schemas"] [22:28:23] path = config/schemas [22:28:23] url = https://gerrit.wikimedia.org/r/mediawiki/event-schemas [22:28:58] but on the deploy target [22:29:00] [submodule "config/schemas"] [22:29:00] path = config/schemas [22:29:00] url = http://tin.eqiad.wmnet/eventlogging/eventbus/.git/modules/config/schemas [22:29:06] dunno where it got that from! :) [22:29:27] thcipriani: maybe whatever scap does for submodules on the target is doing somethign wrong? [22:29:40] ottomata: there is some rewriting of submodules on the target (or there can be) [22:29:49] right, to make the submodule point at tin [22:29:59] must be some bug in whatever is doing that. You can use upstream submodules [22:30:02] but, it looks like it does some wrong rewriting [22:30:12] should I file a ticket? [22:30:21] ottomata: please [22:30:28] oh, is there a scap config to use upstream? [22:30:57] yeah, git_upstream_submodules: True [22:30:59] in scap.cfg [22:44:49] Project beta-scap-eqiad build #82686: 04FAILURE in 3 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82686/ [22:45:51] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms [22:47:38] ok cool [22:47:44] thcipriani: i have to run for the eve, am very very close! [22:47:47] tty tomorrow [22:48:03] ottomata: awesome, glad to hear it, tty later. [22:48:34] thcipriani: that beta-scap-eqiad https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82686/console is interesting [22:48:55] thcipriani: somehow files disappear during some rsync step. It happened once earlier during my morning [22:48:58] hashar: it's done that in the past. Rsync --delay-update loosing all the files. [22:49:15] yeah I haven't investigated [22:49:21] though about a one time error [22:50:16] maybe the l10n cache is being rebuild somehow while scap runs [22:50:46] it is but that should happen after the rsync [22:52:55] anyway I am gone. It is way too late [22:52:57] happy evening! [22:53:47] 7Browser-Tests, 5Patch-For-Review, 3Reading Web Sprint 62 - DJ-Jazzy-Jeff-and-the-Fresh-Sprints: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1889097 (10Jdlrobson) 5stalled>3Open [22:54:31] 7Browser-Tests, 5Patch-For-Review, 3Reading Web Sprint 62 - DJ-Jazzy-Jeff-and-the-Fresh-Sprints: Investigate QuickSurveys browser tests failures - https://phabricator.wikimedia.org/T113534#1668078 (10Jdlrobson) Came back to look at the patch https://gerrit.wikimedia.org/r/246801 No rest for the wicked. Woul... [22:55:09] Yippee, build fixed! [22:55:09] Project beta-scap-eqiad build #82687: 09FIXED in 7 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82687/ [22:58:57] !log marked integration-slave-trusty-1014 as offline due to tmpfs issues [22:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:00:58] 22:59:02 rm: cannot remove �/mnt/home/jenkins-deploy/tmpfs/jenkins-1/lessphp_a8d2hbnctz40gswoo0048ks0okscwog.lesscache�: Permission denied [23:00:59] 22:59:02 rm: cannot remove �/mnt/home/jenkins-deploy/tmpfs/jenkins-1/lessphp_es26d6wr63kg404g08s408okko4ws4c.lesscache�: Permission denied [23:01:29] from integration-slave-trusty-1012 [23:01:38] https://integration.wikimedia.org/ci/job/npm/42593/ [23:01:50] legoktm, ^ [23:05:19] uhhh [23:05:46] !log marked integration-slave-trusty-1012 as offline due to tmpfs issues [23:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:05:52] Krenair: :/ [23:06:31] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Dozens of jobs failing on integration-slave-trusty-1012 because chmod fails for /tmp/jenkins-2 - https://phabricator.wikimedia.org/T120824#1889137 (10Legoktm) I just marked integration-slave-trusty-1012 and integration-slave-trusty-1014 as offline for... [23:07:48] !log sudo salt --show-timeout '*slave*' cmd.run 'rm -fR /mnt/home/jenkins-deploy/tmpfs/jenkins-?/*' [23:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:11:24] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [23:20:53] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms [23:39:20] (03Abandoned) 10Paladox: Update two repo Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/258534 (owner: 10Paladox)