[00:01:12] 06Release-Engineering-Team, 15User-greg: Setup monthly "QA Tribe" meeting - https://phabricator.wikimedia.org/T152570#2852919 (10greg) meeting help request sent to eng-admin :) [00:35:53] (03PS1) 10Legoktm: Support DESTDIR in Makefile [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/325721 [00:35:55] (03PS1) 10Legoktm: Add 'make clean' [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/325722 [00:59:48] (03CR) 10Tim Starling: [C: 032 V: 032] Add 'make clean' [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/325722 (owner: 10Legoktm) [00:59:52] 10Continuous-Integration-Config, 10uprightdiff: Add tests for integration/uprightdiff repository - https://phabricator.wikimedia.org/T152578#2853064 (10Legoktm) [01:00:13] (03CR) 10Tim Starling: [C: 032 V: 032] Support DESTDIR in Makefile [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/325721 (owner: 10Legoktm) [01:02:30] (03PS1) 10Legoktm: Initial debianization [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/325726 (https://phabricator.wikimedia.org/T152577) [01:37:59] (03CR) 10Legoktm: Initial debianization (031 comment) [integration/uprightdiff] (debian) - 10https://gerrit.wikimedia.org/r/325726 (https://phabricator.wikimedia.org/T152577) (owner: 10Legoktm) [01:41:51] 10Continuous-Integration-Config, 06Operations, 06Operations-Software-Development, 13Patch-For-Review: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2853177 (10scfc) Thanks, @hashar. On closer look, these are indeed for the most part `flake8` errors. (I find the... [02:27:04] !log Started (foreachwikiindblist flow.dblist extensions/Flow/maintenance/FlowFixInconsistentBoards.php) 2>&1 | tee FlowFixInconsistentBoards_2016-12-06.txt on deployment-tin [02:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [02:29:16] !log foreachwikiindblist FlowFixInconsistentBoards complete [02:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [02:31:42] !Log Started (mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=enwiki) 2>&1|tee FlowFixInconsistentBoards_enwiki_2016-12-06.txt [02:43:53] !Log Killed (mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=enwiki) 2>&1|tee FlowFixInconsistentBoards_enwiki_2016-12-06.txt [03:52:04] bd808, legoktm, do we really need to manually update security groups and wait up to 60 minutes when setting up a proxy (https://wikitech.wikimedia.org/wiki/Help:Proxy)? [03:52:10] I don't remember either of those being the case before. [03:52:15] Also, when I created a proxy, it shows as: [03:52:24] deferred-changes.wmflabs.org http://10.68.22.160:8080 [03:52:30] The existing ones show actual hostnames on the right. [03:52:38] I don't know about security groups stuff, but usually the proxy takes a minute or something [03:56:33] Thanks, legoktm. Indeed, both are wrong. [04:18:48] Yippee, build fixed! [04:18:49] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #226: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/226/ [04:24:38] matt_flaschen: the ip instead of hostname is a performance improvement [04:24:55] security groups do need to allow access [04:25:20] the time thing... I think that was a bad addition, but it can take a few minutes [04:27:53] the "web" security group is only going to exist if someone already created that in the project. Its not default [06:41:04] 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2853431 (10Arrbee) [07:27:16] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 07Jenkins, 13Patch-For-Review, and 2 others: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#2853483 (10phuedx) [08:44:56] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:19:54] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [10:08:22] 10Continuous-Integration-Config, 06Operations, 06Operations-Software-Development, 13Patch-For-Review: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2853664 (10ArielGlenn) I'm taking a look. [10:15:03] 10Continuous-Integration-Config, 06Operations, 06Operations-Software-Development, 13Patch-For-Review: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2853671 (10ArielGlenn) Looks like that fixed it. [10:19:20] 10Continuous-Integration-Config, 06Operations, 06Operations-Software-Development, 13Patch-For-Review: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2853674 (10hashar) https://gerrit.wikimedia.org/r/#/c/325715/ fix flake8 newly introduced lint E305. The other erro... [10:51:48] 10Continuous-Integration-Config, 06Discovery, 10Wikimedia-Portals: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#2846564 (10Jdrewniak) As @hashar mentioned in his comment: > Regarding the long build time (11 minutes), that is due to the job... [12:14:12] 10Continuous-Integration-Config, 06Discovery, 10Wikimedia-Portals: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#2853784 (10Jdrewniak) As @hashar mentioned in his comment: > Regarding the long build time (11 minutes), that is due to the job c... [12:33:41] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:49:38] 10Continuous-Integration-Infrastructure, 07Jenkins: Some Jenkins jobs tend to be stuck and never times out - https://phabricator.wikimedia.org/T138281#2853808 (10hashar) >>! In T140256#2827310, @MoritzMuehlenhoff wrote: >>>! In T140256#2825802, @hashar wrote: >> I haven't seen that kernel soft lock occurring f... [13:01:41] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:30:25] 10Continuous-Integration-Config, 10uprightdiff: Add tests for integration/uprightdiff repository - https://phabricator.wikimedia.org/T152578#2853896 (10hashar) There are a few dependencies mentioned in the README file: build-essential g++ libopencv-highgui-dev libboost-program-options-dev. We would need them i... [13:30:44] 10Continuous-Integration-Config, 10uprightdiff: Add tests for integration/uprightdiff repository - https://phabricator.wikimedia.org/T152578#2853064 (10hashar) p:05Triage>03Normal [13:41:02] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2853924 (10Paladox) Note: That the update to gerrit 2.13 will make some changes to the gitweb links. 1. parent links no longer have links, it was intentional but I have brought this to upstream attention and are... [13:57:51] !sal [13:57:51] https://tools.wmflabs.org/sal/releng [14:02:54] hashar: will you be around for the next puppet swat? tomorrow 18:00–19:00 UTC+1 [14:03:10] nop [14:03:20] I will not be here, and I would like to add https://gerrit.wikimedia.org/r/#/c/324203/ to the puppet swat [14:03:24] have my 1/1 with greg then I am rushing back home [14:03:25] ok, next week then :) [14:03:33] my wife goes to the gym on wednesday at 19:15 [14:03:51] zeljkof: try poking operations folks directly [14:03:58] it is trivial enough and has no incident to prod [14:04:08] + we got that one cherry picked on the CI puppet master havent we? [14:04:16] hashar: not really sure who to ping :| [14:04:20] if so, it is all about grabbing 3 minutes of attention from someone [14:04:30] ask around in the private channel :d [14:04:44] #releng private? [14:09:13] (03PS1) 10Mholloway: Whitelist Kaartic [integration/config] - 10https://gerrit.wikimedia.org/r/325775 [14:09:14] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ensure ChromeDriver is installed for jobs that run Selenium tests - https://phabricator.wikimedia.org/T117418#2853979 (10zeljkofilipin) 05Open>03Resolved [14:12:40] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories - https://phabricator.wikimedia.org/T152604#2853984 (10hashar) [14:12:51] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories (with zuul-cloner) - https://phabricator.wikimedia.org/T152604#2853996 (10hashar) [14:16:21] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories (with zuul-cloner) - https://phabricator.wikimedia.org/T152604#2854016 (10hashar) [14:46:49] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories (with zuul-cloner) - https://phabricator.wikimedia.org/T152604#2854082 (10hashar) Went with a very lame python script: ``` lang=python from datetime import timedelta with open('clone.txt', 'r') as... [14:53:25] zeljkof: for chrome driver PATH [14:53:25] https://gerrit.wikimedia.org/r/324203 [14:53:31] got merged at 14:08 UTC [14:53:40] and nodepool refresh the snapshots at 14:14UTC [14:53:48] so potentially it is included now [14:53:54] well, that is convenient :) [14:54:04] let me test on a patch [14:54:19] https://gerrit.wikimedia.org/r/#/c/320829/ [14:54:36] will change browser to chrome, let's see if it sees chromedriver [14:54:41] jessie instance: $ ssh jenkins@10.68.18.104 which chromedriver [14:55:11] doesn't work? [14:55:23] ah [14:55:33] /usr/local/bin/chromedriver [14:55:40] can't paste a line starting with a / :D [14:56:05] :D [14:56:14] add space [14:56:31] the trusty instance does not have it [14:56:52] really? didn't we tests there too? [14:57:42] on labnodepool1001 $ grep /usr/local/bin/chromedriver /var/log/nodepool/image.log* [14:57:48] .ci-jessie-wikimedia: Notice: /Stage[main]/Contint::Browsers/File[/usr/local/bin/chromedriver]/ensure: created [14:57:58] maybe the trusty one got done before the patch landed [14:58:19] yeah it is slightly older [14:58:39] zeljkof: I am refreshing the trusty snapshot [14:58:44] nodepool image-update wmflabs-eqiad ci-trusty-wikimedia [14:58:45] ;D [14:58:52] nice [14:59:35] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ensure ChromeDriver is installed for jobs that run Selenium tests - https://phabricator.wikimedia.org/T117418#2854114 (10hashar) The Nodepool Jessie image had the patch included. I am refreshing the Trusty one. [15:00:15] running the job... https://integration.wikimedia.org/ci/job/mwext-VisualEditor-npm-node-4-jessie/900/console [15:03:56] hashar: all green :) https://integration.wikimedia.org/ci/job/mwext-VisualEditor-npm-node-4-jessie/900/console [15:04:28] \O/ [15:04:42] !log Image ci-trusty-wikimedia-1481122712 in wmflabs-eqiad is ready T117418 [15:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:07:00] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories (with zuul-cloner) - https://phabricator.wikimedia.org/T152604#2854132 (10hashar) The equivalent job that runs on Jessie (mediawiki-extensions-hhvm-jessie) does load the cache for Abusefilter :( So l... [15:12:09] 10Continuous-Integration-Infrastructure, 07Zuul: mediawiki-extensions jobs take 4 minutes to clone repositories (with zuul-cloner) - https://phabricator.wikimedia.org/T152604#2854136 (10hashar) On Jessie sorted by time: ``` 0:00:02s INFO:zuul.Cloner:Prepared mediawiki/extensions/Babel 0:00:02s INFO:zuul.Cloner... [15:16:38] (03PS1) 10Hashar: dib: add more repositories to git cache [integration/config] - 10https://gerrit.wikimedia.org/r/325786 (https://phabricator.wikimedia.org/T152604) [15:27:17] (03CR) 10Hashar: [C: 032] "Will have to rebuild the base images." [integration/config] - 10https://gerrit.wikimedia.org/r/325786 (https://phabricator.wikimedia.org/T152604) (owner: 10Hashar) [15:28:23] (03Merged) 10jenkins-bot: dib: add more repositories to git cache [integration/config] - 10https://gerrit.wikimedia.org/r/325786 (https://phabricator.wikimedia.org/T152604) (owner: 10Hashar) [15:35:47] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2854164 (10hashar) Looking at a Jessie instance: ``` # journalctl |grep 'Startup fini' Dec 07 15:01:5... [15:39:31] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2854165 (10Paladox) I managed to fix project access links in https://gerrit-review.googlesource.com/#/c/92620/ (waiting for it to be merged and then I will get it backported) :) [15:39:49] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:43:28] (03PS2) 10Mholloway: Whitelist Kaartic [integration/config] - 10https://gerrit.wikimedia.org/r/325775 [15:43:44] (03CR) 10Paladox: [C: 031] Whitelist Kaartic [integration/config] - 10https://gerrit.wikimedia.org/r/325775 (owner: 10Mholloway) [15:50:59] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2854196 (10Paladox) That patch was abandoned in favour of https://gerrit-review.googlesource.com/#/c/92645/ since it is cherry-picked to stable-2.13 then will be merged into master. [15:51:04] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:03:10] Yipee i fixed the project access links in gerrit 2.13. https://gerrit-review.googlesource.com/#/c/92620/ cherry picked to stable-2.13 https://gerrit-review.googlesource.com/#/c/92645/ [16:16:35] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2854299 (10Paladox) It was merged and will be released when gerrit 2.13.4 is released. [16:30:51] 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 07Technical-Debt: Add support to peridoic CI tests for exercising arbitrary revisions - https://phabricator.wikimedia.org/T152455#2854327 (10Niedzielski) [16:31:56] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2854343 (10hashar) In the instance (T113342#2850982) dhclient starts straight with a DHCPDISCOVER. Th... [16:33:54] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2854358 (10Paladox) Ah, it looks like a new exclusiveGroupPermissions = read is added in project.config in refs/meta/config so it will hide the access link, it can be removed which I did and it worked. [17:45:37] 03Scap3: scap sync-l10n AttributeError: 'Namespace' object has no attribute 'message' - https://phabricator.wikimedia.org/T152390#2854693 (10thcipriani) 05Open>03Resolved [17:55:20] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:35:18] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:44:51] Project beta-code-update-eqiad build #133424: 04FAILURE in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133424/ [18:48:50] PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:54:47] PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:54:51] Project beta-code-update-eqiad build #133425: 04STILL FAILING in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133425/ [18:55:25] PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:03:16] PROBLEM - Puppet run on deployment-sentry01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:03:22] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:04:48] Project beta-code-update-eqiad build #133426: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133426/ [19:06:00] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:08:40] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:09:51] 10Deployment-Systems, 10MediaWiki-Internationalization, 06Performance-Team, 07Performance: Experiment with plain .php files for l10n cache instead of CDB - https://phabricator.wikimedia.org/T99740#1297948 (10Gilles) p:05Low>03High [19:11:23] 10Deployment-Systems, 06Release-Engineering-Team, 06Operations, 06Performance-Team, 07HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#2854985 (10Gilles) p:05Normal>03Triage [19:12:43] PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:13:11] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Linter: Deploy Linter extension to Beta cluster - https://phabricator.wikimedia.org/T152620#2854992 (10Legoktm) [19:14:43] Project beta-code-update-eqiad build #133427: 04STILL FAILING in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133427/ [19:14:48] ^ because gerrit is down [19:15:19] Back "soon" [19:15:44] Some ads in the meanwhile [19:15:57] PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:16:49] PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:17:03] PROBLEM - Puppet run on integration-slave-precise-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:17:04] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:24:42] Project beta-code-update-eqiad build #133428: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133428/ [19:34:42] Project beta-code-update-eqiad build #133429: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133429/ [19:44:43] Project beta-code-update-eqiad build #133430: 04STILL FAILING in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133430/ [19:54:42] Project beta-code-update-eqiad build #133431: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133431/ [20:04:42] Project beta-code-update-eqiad build #133432: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133432/ [20:14:41] Project beta-code-update-eqiad build #133433: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133433/ [20:24:44] Project beta-code-update-eqiad build #133434: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133434/ [20:29:10] When is Gerrit going to be back up? :/ [20:31:51] RoanKattouw, I'd watch -operations for that sort of information [20:34:44] Project beta-code-update-eqiad build #133435: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133435/ [20:44:42] Project beta-code-update-eqiad build #133436: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133436/ [20:46:51] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2855307 (10hashar) I have looked at the jessie snapshot. `/etc/network/interfaces.d/eth1` magically... [20:51:14] hashar the html links need updating for gerrit 2.13. [20:51:20] jenkins links please [20:51:26] see https://gerrit.wikimedia.org/r/#/c/315057/ please. [20:52:18] 10Gerrit: Gerrit side-by-side diff view does not mark a removed "space" in a string (example given) (intraline different doesn't work with spaces) - https://phabricator.wikimedia.org/T51006#2855320 (10demon) [20:52:23] 10Gerrit, 13Patch-For-Review: Update gerrit to 2.13.3 - https://phabricator.wikimedia.org/T146350#2855317 (10demon) 05Open>03Resolved a:03demon [20:53:34] 10Gerrit: Gerrit side-by-side diff view does not mark a removed "space" in a string (example given) (intraline different doesn't work with spaces) - https://phabricator.wikimedia.org/T51006#2855338 (10Paladox) 05Open>03Resolved This now works. Please reopen if the problem still happends. [20:54:44] Yippee, build fixed! [20:54:44] Project beta-code-update-eqiad build #133437: 09FIXED in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/133437/ [20:59:45] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [21:05:24] RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [21:08:16] RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:08:20] RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:59] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:44] RECOVERY - Puppet run on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [21:18:40] RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:56] RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:50] RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:05] RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:05] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:51] RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [21:34:19] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [22:19:05] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2855719 (10hashar) I create a new image and booted an instance out of it. Deleted the lease and sync... [22:22:05] paladox: I reviewed your patch related to Gerrit pretification of the CI results ( https://gerrit.wikimedia.org/r/#/c/325826/ ) [22:22:07] looks good to me [22:22:19] maybe ostriches can get it pushed [22:22:27] hashar thanks [22:22:31] i also tested it too [22:22:33] and works [22:22:34] :) [22:22:41] only impact would be that puppet will restart Gerrit once the change land (iirc) [22:22:45] yeah! [22:22:52] test on labs is great :] [22:23:39] just found out Gerrit got upgraded :D [22:23:50] I have seen an email about it but did not connect that was for tonight ! [22:23:51] Yep [22:24:05] I didnt either [22:24:09] looks like nothing exploded so far [22:24:09] didn't check my emails [22:24:16] beside the comment formatting [22:24:20] only found out the upgrade was happening when i checked -operations [22:24:29] hashar some gitweb links wont work [22:25:11] expecially our workaround for T137354 [22:25:13] hashar ^^ [22:26:10] but luckly i fixed it today. [22:26:10] guess we will survive :D [22:26:17] but will be released in a 2.13.4 update [22:26:30] also gerrit no longer links parent links [22:27:16] I am away again, need to fill a bunch of papers [22:27:22] ok [22:27:34] taxes / retirements fund etc [22:27:39] it is a mess! [22:27:48] Oh [22:34:10] How long does one normally have to wait for a new repository request to be acted on? [22:34:46] RoanKattouw qchris normaly does them so not sure how often he checks. [22:34:48] https://www.mediawiki.org/wiki/Git/New_repositories/Requests shows that requests were being fulfilled daily for a while, but that seems to have stopped last week [22:36:17] thcipriani: I've got a scap3 ssh config problem you might be able to help with [22:36:31] sure, shoot [22:36:41] I have a scap server setup in the striker labs project [22:36:51] it used to work [22:36:56] :) [22:37:13] now the ssh key is getting rejected with "error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup returned status 1" [22:37:14] only really needs apache -> /srv/deployment [22:37:45] when you try to deploy from it to a target? [22:37:54] yeah [22:37:55] Failed publickey for deploy-service from 10.68.23.63 port 40406 ssh2: RSA cb:43:c3:76:3a:68:b0:29:1a:8f:3f:31:1f:87:f8:7c [22:38:18] what I don't remember is if there was magic in setting that up (nothing in my notes) [22:38:26] and why it might be broken now [22:39:13] hrm. I've never run into that before [22:39:19] RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [22:39:20] it certainly does return 1 in beta [22:39:45] hmmm but then something lets it work anyway? [22:40:37] I'm checking that now for deploy-service :) [22:40:50] I think the last time this project was used was about 8 weeks ago so there could have been all kinds of puppet changes that I didn't notice [22:42:14] yeah, it does seem to allow login [22:42:41] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service deployment-parsoid09.deployment-prep.eqiad.wmflabs [22:42:44] works [22:42:59] *nod* that's pretty much how I'm testing too [22:43:12] even though: /usr/sbin/ssh-key-ldap-lookup deploy-service [22:43:21] returns 1 on deployment-parsoid09 [22:43:25] so I've either got the wrong key or some missing hiera stuff [22:44:05] or there's some other crazy problem that someone fixed beta and I didn't notice :) [22:44:28] * thcipriani looks at puppetmaster patches [22:46:04] hrm, nothing looks particularly suspect as long as keyholder is armed and/or deploy-service has access to the key some other way [22:46:33] the fingerprint of the key matches one in the deployment-tin keyholder [22:46:37] so that seems good [22:47:46] I'll try to audit the hiera stuff [22:48:07] hrm, is it making to checking the key in /var/log/auth.log on the target? [22:48:13] yes [22:48:20] weird [22:48:21] and its being rejected there [22:49:38] (03CR) 10Hashar: [C: 032] Whitelist Kaartic [integration/config] - 10https://gerrit.wikimedia.org/r/325775 (owner: 10Mholloway) [22:50:42] hrm, afaik it should just be checking against /etc/ssh/userkeys/[key] [22:50:44] (03Merged) 10jenkins-bot: Whitelist Kaartic [integration/config] - 10https://gerrit.wikimedia.org/r/325775 (owner: 10Mholloway) [22:50:50] I've got a hunch this is puppet realted [22:50:55] there was one thing we changed wrt to the pam module [22:51:09] trying to remember what it was, but it doesn't sound like your problem is pam related [22:51:11] like my project isn't using the puppet master it thinks it is possibly [22:52:26] oh, yeah, puppet::master::self vs puppetmaster::standalone is somewhat of a recent thing, IIRC [22:52:32] yeah [22:53:59] thanks for rubber ducking [22:56:13] werd [22:56:48] (03PS3) 10Hashar: dib: fix dupe sourcing in interfaces [integration/config] - 10https://gerrit.wikimedia.org/r/325561 (https://phabricator.wikimedia.org/T113342) [22:56:50] (03PS2) 10Hashar: dib: ensure /etc/apt/sources.list exists [integration/config] - 10https://gerrit.wikimedia.org/r/325576 [22:56:52] (03PS1) 10Hashar: nodepool: delete DHCP leases from snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/325852 (https://phabricator.wikimedia.org/T113342) [22:58:51] bd808: thcipriani: iirc there is some weird race condition between puppet and keyholder [22:59:06] happened multiple times when we have rebuild them a few weeks ago [22:59:31] I think the fix was restart the keyholder (or maybe disarm and arm) [22:59:55] might be something else entirely though [23:00:30] ohh [23:00:40] Gerrit 2.13 detects when an updated change is just a rebase [23:00:50] oh? [23:00:51] and shows it as a comment! [23:01:00] eg https://gerrit.wikimedia.org/r/#/c/325576/ [23:01:06] I did push a new chain [23:01:17] imagines jenkins-bot then doesnt have to vote a second time [23:01:18] and it commented for me: Uploaded patch set 2: Patch Set 1 was rebased. [23:01:38] iirc we strip the Verified vote [23:01:46] that sounds fairly normal though [23:01:49] on a rebase we keep codereview +1 votes though [23:01:51] "Patch Set was rebased" [23:01:55] yeah [23:02:10] it will also tell you if your on an old patch set too [23:02:13] but I am pretty sure Gerrit used to not mention it was a rebase [23:02:13] shows orange [23:02:26] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:02:28] so what I did was to get in Gerrit, Press rebase button on each patch [23:02:35] then send my new change :( [23:03:02] I think gerrit also shows when you remove reviewers as comments [23:03:11] not sure if that is only for NoteDB [23:03:18] (03PS4) 10Hashar: dib: fix dupe sourcing in interfaces [integration/config] - 10https://gerrit.wikimedia.org/r/325561 (https://phabricator.wikimedia.org/T113342) [23:03:33] (03CR) 10Hashar: [C: 032] "That is being upstreamed :]" [integration/config] - 10https://gerrit.wikimedia.org/r/325561 (https://phabricator.wikimedia.org/T113342) (owner: 10Hashar) [23:04:00] (03PS3) 10Hashar: dib: ensure /etc/apt/sources.list exists [integration/config] - 10https://gerrit.wikimedia.org/r/325576 [23:04:14] (03CR) 10Hashar: [C: 032] dib: ensure /etc/apt/sources.list exists [integration/config] - 10https://gerrit.wikimedia.org/r/325576 (owner: 10Hashar) [23:04:32] (03Merged) 10jenkins-bot: dib: fix dupe sourcing in interfaces [integration/config] - 10https://gerrit.wikimedia.org/r/325561 (https://phabricator.wikimedia.org/T113342) (owner: 10Hashar) [23:05:57] (03Merged) 10jenkins-bot: dib: ensure /etc/apt/sources.list exists [integration/config] - 10https://gerrit.wikimedia.org/r/325576 (owner: 10Hashar) [23:06:13] (03PS1) 10Hashar: nodepool: delete DHCP leases from snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/325854 (https://phabricator.wikimedia.org/T113342) [23:06:22] doh [23:06:53] ostriches: interestingly Gerrit failed to register a change I have sent ( was https://gerrit.wikimedia.org/r/325852 ) [23:07:08] grrrit-wm did sent a notification about it at 22:56:52 [23:07:17] but the change did not exist in the UI [23:07:24] resent as https://gerrit.wikimedia.org/r/325854 [23:07:38] (03CR) 10Hashar: [C: 032] nodepool: delete DHCP leases from snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/325854 (https://phabricator.wikimedia.org/T113342) (owner: 10Hashar) [23:08:24] (03Merged) 10jenkins-bot: nodepool: delete DHCP leases from snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/325854 (https://phabricator.wikimedia.org/T113342) (owner: 10Hashar) [23:15:35] 10Gerrit: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#2855839 (10Eloquence) [23:17:24] ostriches https://phabricator.wikimedia.org/T152640 [23:17:58] thcipriani: mystery partially solved. the key I'm sending and the key the host is expecting don't match. Now to track down the right key [23:19:33] 10Gerrit: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#2855865 (10demon) p:05Triage>03High a:03demon [23:26:49] 10Gerrit: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#2855875 (10demon) Weird, your account_id is actually 20, and Eloquence does indeed claim erik as its shell name. Bizarre.... [23:28:17] 10Gerrit: Gerrit 2.13 failed to register a change - https://phabricator.wikimedia.org/T152642#2855876 (10hashar) [23:31:33] 10Gerrit: Gerrit 2.13 failed to register a change - https://phabricator.wikimedia.org/T152642#2855876 (10Paladox) Should this be rated as high priority since this could happen again at any time? [23:33:53] thcipriani: mystery solved. _joe_ changed the heira variable name from "scap::server::keyholder_agents" to "scap::keyholder_agents" [23:33:57] 10Gerrit: Gerrit 2.13 failed to register a change - https://phabricator.wikimedia.org/T152642#2855929 (10hashar) Nothing shows up in Gerrit error log as I can tell though Gerrit has been restarted just after. [23:34:22] now I am really off ! [23:34:29] bd808: ah ha! [23:35:03] yeah _joe_ did a lot of work making scap fancy :) [23:35:14] didn't realize there was a variable rename [23:35:17] * thcipriani squints [23:36:08] it was a long time ago apaprently [23:37:51] 10Gerrit: Gerrit 2.13 failed to register a change - https://phabricator.wikimedia.org/T152642#2855934 (10demon) Definitely exists in the DB: ``` change_key | created_on | last_updated_on | owner_account_id | dest_project_name | dest_branch_name | status | curren...