[03:58:45] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #340: 04FAILURE in 2 min 44 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/340/ [04:18:15] Yippee, build fixed! [04:18:15] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #340: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/340/ [05:36:09] 10Gerrit: Content of Openzim repository accidentially somehow deleted - https://phabricator.wikimedia.org/T161264#3127447 (10Kelson) I suspect I have deleted the content by deleting these two branches (push origin --delete) remote/gerrit/zeno2 and remote/gerrit/master. [06:31:50] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #309: 04FAILURE in 1 hr 51 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/309/ [07:09:23] (03PS1) 10Jonas Kress (WMDE): Enable test browsertest and disable Qunit for WikibaseLexeme [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) [08:07:38] PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:33:16] PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:53:15] 10Gerrit: Content of Openzim repository accidentially somehow deleted - https://phabricator.wikimedia.org/T161264#3127911 (10Reedy) >>! In T161264#3127447, @Kelson wrote: > I suspect I have deleted the content by deleting these two branches (push origin --delete) remote/gerrit/zeno2 and remote/gerrit/master. Ok... [11:03:43] (03PS4) 10Zfilipin: Do not run rake-jessie job for mediawiki/core [integration/config] - 10https://gerrit.wikimedia.org/r/343848 [11:08:05] PROBLEM - Puppet run on integration-c1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:06:16] (03CR) 10Aleksey Bekh-Ivanov (WMDE): [C: 031] Enable test browsertest and disable Qunit for WikibaseLexeme [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [12:14:06] (03CR) 10Thiemo Mättig (WMDE): "I would not disable the QUnit tests, but fix the bug instead. I can track it down to the file tests/qunit/data/mediawiki.jqueryMsg.data.js" [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [12:25:08] (03CR) 10Aleksey Bekh-Ivanov (WMDE): [C: 031] "Discussed with Thiemo. Decided to ditch QUnit tests for now." [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [12:25:12] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "Talked to Aleksey. We know we must re-enable the QUnit tests the moment we are adding our first relevant QUnit tests to the extension. Thi" [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [12:33:32] (03CR) 10Hashar: [C: 032] "It is all your ! :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [12:34:46] (03Merged) 10jenkins-bot: Enable test browsertest and disable Qunit for WikibaseLexeme [integration/config] - 10https://gerrit.wikimedia.org/r/344578 (https://phabricator.wikimedia.org/T161201) (owner: 10Jonas Kress (WMDE)) [13:03:04] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10media-storage, 13Patch-For-Review: deployment-ms-be01.deployment-prep and deployment-ms-be02.deployment-prep have high load / system CPU - https://phabricator.wikimedia.org/T160990#3128087 (10hashar) From what I... [13:03:30] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10media-storage, 13Patch-For-Review: deployment-ms-be01.deployment-prep and deployment-ms-be02.deployment-prep have high load / system CPU - https://phabricator.wikimedia.org/T160990#3128088 (10hashar) [13:08:23] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10media-storage, 13Patch-For-Review: deployment-ms-be01.deployment-prep and deployment-ms-be02.deployment-prep have high load / system CPU - https://phabricator.wikimedia.org/T160990#3128090 (10hashar) Also found... [13:15:26] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:34:57] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog: Merge apps/android/wikipedia Jenkins jobs lint and test - https://phabricator.wikimedia.org/T161305#3128129 (10hashar) [13:35:38] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog: Merge apps/android/wikipedia Jenkins jobs lint and test - https://phabricator.wikimedia.org/T161305#3128144 (10hashar) p:05Triage>03Normal [13:41:04] (03CR) 10Hashar: "Can we do a subset of this change that just removes the 'rake-jessie' job. It is only used for rubocop so I guess we can stop triggering t" [integration/config] - 10https://gerrit.wikimedia.org/r/343848 (owner: 10Zfilipin) [13:46:44] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Wikidata: Revisit Jenkins jobs being triggered for Wikibase - https://phabricator.wikimedia.org/T160989#3128156 (10hashar) Moving some jobs to postmerge would at least prevent them from running on every patchsets. The postmerge jobs reports on t... [13:47:59] (03CR) 10Hashar: [C: 032] test: invoke rspec directly [selenium] - 10https://gerrit.wikimedia.org/r/330856 (https://phabricator.wikimedia.org/T137112) (owner: 10Hashar) [13:49:09] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:49:35] (03Merged) 10jenkins-bot: test: invoke rspec directly [selenium] - 10https://gerrit.wikimedia.org/r/330856 (https://phabricator.wikimedia.org/T137112) (owner: 10Hashar) [13:50:12] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#3128158 (10hashar) I went ahead and just +2ed the patch. The last remaining issue is PhantomJS not being... [13:50:14] (03CR) 10jenkins-bot: test: invoke rspec directly [selenium] - 10https://gerrit.wikimedia.org/r/330856 (https://phabricator.wikimedia.org/T137112) (owner: 10Hashar) [13:53:10] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#3128159 (10hashar) Just found out that PhantomJS 2.1.1 is available in jessie-backports since March 8th!!!... [14:35:26] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [14:52:41] RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:08:14] RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:01] (03PS1) 10Umherirrender: [Pickle] Add Scribunto as dependency [integration/config] - 10https://gerrit.wikimedia.org/r/344638 [15:51:34] PROBLEM - Check systemd state on contint1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:54:29] hashar ^^ [15:54:46] hasharAway ^^ [15:55:02] paladox: puppet error apparently. Solved by Alexandros [15:55:09] oh ok [15:55:10] thanks [15:59:54] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[create-/etc/bacula-keypair],Service[bacula-fd] [16:05:06] (03PS1) 10Hashar: wmf branches: drop mediawiki-extensions-php55-trusty [integration/config] - 10https://gerrit.wikimedia.org/r/344642 (https://phabricator.wikimedia.org/T94149) [16:05:34] PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:05:44] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[create-/etc/bacula-keypair] [16:06:52] (03PS1) 10Hashar: wmf branches: drop mediawiki-core-phpcs-trusty [integration/config] - 10https://gerrit.wikimedia.org/r/344643 (https://phabricator.wikimedia.org/T94149) [16:07:28] twentyafterfour i think https://secure.phabricator.com/D17553 that will fix the daemon problem i was having. [16:08:35] RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational [16:08:44] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:15:54] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:16:35] RECOVERY - Check systemd state on contint1001 is OK: OK - running: The system is fully operational [16:17:42] PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:24:58] (03PS1) 10Hashar: Drop transient Zuul rules [integration/config] - 10https://gerrit.wikimedia.org/r/344645 (https://phabricator.wikimedia.org/T94149) [16:26:14] (03Abandoned) 10Hashar: wmf branches: drop mediawiki-core-phpcs-trusty [integration/config] - 10https://gerrit.wikimedia.org/r/344643 (https://phabricator.wikimedia.org/T94149) (owner: 10Hashar) [16:29:12] (03PS2) 10Hashar: Skip php 5.5 for mediawiki wmf branches [integration/config] - 10https://gerrit.wikimedia.org/r/344642 (https://phabricator.wikimedia.org/T94149) [16:33:18] (03CR) 10Hashar: "This should stop triggering any job containing 'php55' in their names whenever the project is mediawiki/* and the branch begins with 'wmf'" [integration/config] - 10https://gerrit.wikimedia.org/r/344642 (https://phabricator.wikimedia.org/T94149) (owner: 10Hashar) [16:36:45] hashar yipee, i partially fixed it when logging into polygerrit (redirects to dashbored properly now :) [16:57:44] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:58:44] twentyafterfour tested there new fixes and yep it fixes the daemon :) [18:46:49] (03Abandoned) 10Hashar: [operations/puppet] remove tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/343317 (owner: 10Hashar) [18:47:29] (03CR) 10Legoktm: [C: 04-1] "We still have Wikimedia wikis running PHP 5.5." [integration/config] - 10https://gerrit.wikimedia.org/r/344642 (https://phabricator.wikimedia.org/T94149) (owner: 10Hashar) [19:50:54] 10Gerrit: Content of Openzim repository accidentially somehow deleted - https://phabricator.wikimedia.org/T161264#3129091 (10Kelson) I can't neither because I can only upload commits with my user ``` remote: Resolving deltas: 100% (79/79) remote: Processing changes: refs: 1, done remote: remote: ERROR: I... [20:37:23] 10Scap (Scap3-MediaWiki-MVP), 10scap2: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#2010701 (10Krinkle) >>! In T126306#2017889, @Tgr wrote: >>>! In T126306#2010715, @bd808 wrote: >> Is there anything other than wmf-config/PrivateSettings.php that would reall... [20:37:36] hashar hi, i've fixed the logging in problem in polygerrit where it did not redirect correctly. :) [20:39:25] (03CR) 10Hashar: [C: 04-1] "Holding this till we reduce the number of instances being consumed. That is the Little Steps Sprint: https://phabricator.wikimedia.org/pr" [integration/config] - 10https://gerrit.wikimedia.org/r/341304 (owner: 10Addshore) [20:39:43] 10Gerrit: Content of Openzim repository accidentially somehow deleted - https://phabricator.wikimedia.org/T161264#3129174 (10Paladox) @Kelson hi, needs you to merge https://gerrit.wikimedia.org/r/#/c/344685/ which will allow you to do that. [20:40:02] paladox: guess I can try it out again next week :) [20:40:10] ok :) [20:40:44] hashar im loving the new private changes feature (not in 2.14 but will be in 2.15) [20:40:45] Replaces drafts. [20:40:48] You can mark your changes as private or not private. [20:41:22] which completely replace the Draft system isn't it ? [20:41:52] Yep [20:42:04] They are also adding support for wip. [20:42:14] neat [20:43:56] So drafts are deprecated currently. All drafts will be migrated to private changes with https://gerrit-review.googlesource.com/#/c/97252/ [20:45:55] hashar also gerrit's index will be reliable once we switch to elasticsearch (supported from gerrit 2.14+) [20:49:46] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#3129183 (10Paladox) This https://github.com/eclipse/jgit/commit/4ddd4a3d1 looks like a fix for this. Fixed in gerrit 2.13.7+. [21:20:15] 10Gerrit, 10Analytics-Tech-community-metrics: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3124611 (10Paladox) I will report this upstream. [21:21:12] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129284 (10Halfak) This issue is blocking @Mattflaschen-WMF, @Mooeypoo, @Etonkovidova, and the rest of #edit-revie... [21:21:18] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129286 (10Halfak) p:05Triage>03High [21:22:32] 10Gerrit, 10Analytics-Tech-community-metrics: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3129319 (10Paladox) Reported it here https://bugs.chromium.org/p/gerrit/issues/detail?id=5866 [21:23:00] 10Gerrit, 10Analytics-Tech-community-metrics, 07Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3129320 (10Paladox) [21:25:38] I wonder what this "Cannot display change 256050 because it has no revisions." Even means? [21:25:56] does that mean someone created a patch that does nothing. [21:26:03] Can we delete it? [21:26:25] I am looking at the changes reported as broken here https://phabricator.wikimedia.org/T157898#3124564 [21:27:49] 10Gerrit, 10Analytics-Tech-community-metrics: Numerous Gerrit patchsets cannot be accessed: "Cannot display change because it has no revisions." - https://phabricator.wikimedia.org/T161207#3124625 (10Paladox) We will be able to delete all drafts with gerrit 2.14. (admins can delete them too). [21:30:17] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3129337 (10Ladsgroup) a:03Ladsgroup [21:30:44] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3110103 (10Ladsgroup) I get this fixed by migrating to a new instance ASAP. [21:31:54] 10Beta-Cluster-Infrastructure, 10ORES, 06Revision-Scoring-As-A-Service, 15User-Ladsgroup: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762#3110103 (10greg) (Let us know if you need any assistance.) [21:34:00] 10Gerrit, 10Analytics-Tech-community-metrics, 07Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3129359 (10Paladox) Ah, found the fix https://gerrit-review.googlesource.com/#/c/91583/ [21:34:35] !log launching deployment-ores-redis-02 (T160762) [21:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:34:39] T160762: deployment-ores-redis /srv/ redis is too small (500MBytes) - https://phabricator.wikimedia.org/T160762 [21:43:46] twentyafterfour: When I run puppet agent in a recently spawned instance in beta I get this: https://phabricator.wikimedia.org/P5129 is it a known issue? [21:44:02] Amir1: yes [21:44:32] Amir1: I wrote a step by step way to fix it on https://phabricator.wikimedia.org/T148929 [21:45:14] thanks [21:45:22] let me look what can I do [21:46:10] Amir1: you have root access don't you ? [21:46:22] yup ;) [21:46:40] so in theory the steps should just work :D [21:47:06] I hope ores hasn't suffered from the too small /srv [21:47:39] not in prod but now it's blocking ERI deployment [21:47:57] hashar: I did this. till failing [21:48:03] ;( [21:48:09] https://www.irccloud.com/pastebin/fDcCIxYU/ [21:48:15] maybe I'm missing something [21:48:49] Should I reset certs in puppetmaster? I'm afraid it might cause lots of trouble [21:48:51] have you cleared the certificate on the puppet master? [21:49:00] Master: [21:49:00] puppet cert clean [21:49:00] I should do that too [21:49:05] okay [21:49:09] nice [21:49:41] then there are four more magic lines to fix the agent [21:51:45] ladsgroup@deployment-puppetmaster02:~$ sudo puppet cert clean deployment-ores-redis-02.deployment-prep.eqiad.wmflabs [21:51:45] Error: Could not find a serial number for deployment-ores-redis-02.deployment-prep.eqiad.wmflabs [21:52:13] hashar: needs a puppet agent beforehand? [21:52:28] bah [21:52:55] guess nothing got signed [21:53:26] so you can do the next steps [21:53:35] root@deployment-ores-redis-02:~# puppet agent -tv [21:53:35] Info: Creating a new SSL key for deployment-ores-redis-02.deployment-prep.eqiad.wmflabs [21:53:51] etc [21:54:17] okay, thanks. Let me do and come back to you [21:55:44] root@deployment-ores-redis-02:~# puppet agent -tv [21:55:44] Exiting; no certificate found and waitforcert is disabled [21:57:05] hashar im wondering is the parent commit to https://gerrit.wikimedia.org/r/#/c/256050 a draft? [21:57:23] or https://gerrit.wikimedia.org/r/#/c/99101 [21:59:05] paladox: the patchset shows the parent commit [21:59:11] and you can then search the commit sha1 [21:59:21] Oh, i carn't view them though [21:59:28] https://gerrit.git.wmflabs.org/r/#/c/2/ [21:59:34] Cannot display change 256050 because it has no revisions. [21:59:39] the first change works for me but https://gerrit.wikimedia.org/r/#/c/99101 does not [22:00:21] and https://gerrit.wikimedia.org/r/#/c/256050 is a draft change I made [22:00:28] oh. [22:01:26] I wonder why this https://gerrit.wikimedia.org/r/#/c/99101 is getting java.lang.ArrayIndexOutOfBoundsException: 0 [22:01:52] Amir1: then I have no idea :((( [22:01:58] maybe delete the instance and try again :(- [22:02:15] paladox: Amir1: I gotta sleep a bit sorry. Good luck! [22:02:23] ok [22:02:28] Yeah, I'm doing that [22:02:34] have fun [22:02:51] if I dream, surely I will have fun :d [23:06:23] PROBLEM - Puppet run on deployment-ores-redis-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]