[00:00:29] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1115397 (10Dzahn) gotcha, then to the meeting Etherpad it goes [00:02:36] legoktm: I'm fixing jsduck for ve (needs submodules) [00:07:47] (03PS1) 10Krinkle: Bring back mwext-VisualEditor-jsduck override for submodules [integration/config] - 10https://gerrit.wikimedia.org/r/196491 [00:09:43] (03PS2) 10Krinkle: Bring back mwext-VisualEditor-jsduck override for submodules [integration/config] - 10https://gerrit.wikimedia.org/r/196491 [00:11:23] argh [00:12:01] (03CR) 10Krinkle: [C: 032] "Deployed 'mwext-VisualEditor-jsduck. Enabling now.." [integration/config] - 10https://gerrit.wikimedia.org/r/196491 (owner: 10Krinkle) [00:12:51] James_F: I'll amend it [00:16:02] legoktm: Ta. [00:17:26] (03Merged) 10jenkins-bot: Bring back mwext-VisualEditor-jsduck override for submodules [integration/config] - 10https://gerrit.wikimedia.org/r/196491 (owner: 10Krinkle) [00:17:52] !log Reloading Zuul to deploy I46c60d520 [00:17:56] Logged the message, Master [00:19:15] (03PS1) 10Legoktm: Add "composer test" entry point for linting and phpunit [integration/jenkins] - 10https://gerrit.wikimedia.org/r/196494 [00:19:48] (03PS7) 10Legoktm: Add npm test for Citoid extension [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:22:01] 6Release-Engineering, 10Wikimedia-Hackathon-2015: Release/QA tasks at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92565#1115429 (10dduvall) [00:23:19] (03PS8) 10Legoktm: Add npm test for Citoid extension [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:23:46] (03CR) 10Legoktm: "PS7: rebase, PS8: Added jjb config for mwext-Citoid-npm" [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:23:53] legoktm: Why no strict? [00:24:09] Krinkle: it's apparently deprecated in phpunit 4.5 [00:24:26] legoktm: Yeah, replaced with 4 separate options [00:24:47] * legoktm finds docs [00:24:52] https://phpunit.de/manual/4.7/en/strict-mode.html [00:25:20] https://github.com/sebastianbergmann/phpunit/wiki/Release-Announcement-for-PHPUnit-4.5.0#deprecated-settings [00:25:46] the one missing from taht page is risky tests: https://phpunit.de/manual/current/en/risky-tests.html [00:25:53] which also used to be covered by strict [00:26:51] * legoktm updates [00:28:50] (03PS2) 10Legoktm: Add "composer test" entry point for linting and phpunit [integration/jenkins] - 10https://gerrit.wikimedia.org/r/196494 [00:33:35] (03PS1) 10Krinkle: Actually commit mwext-VisualEditor-jsduck [integration/config] - 10https://gerrit.wikimedia.org/r/196499 [00:33:55] (03CR) 10Krinkle: "Deployed mwext-VisualEditor-jsduck." [integration/config] - 10https://gerrit.wikimedia.org/r/196499 (owner: 10Krinkle) [00:34:04] Krinkle: :-) [00:37:44] 10Continuous-Integration, 6Labs, 10OOjs, 10Wikimedia-Labs-Infrastructure, 6operations: Jenkins failing with "Error: GET https://saucelabs.com: Couldn't resolve host name." - https://phabricator.wikimedia.org/T92351#1115471 (10scfc) I don't think so because that was merged earlier. But on March 6th https... [00:44:29] (03CR) 10Krinkle: [C: 032] Actually commit mwext-VisualEditor-jsduck [integration/config] - 10https://gerrit.wikimedia.org/r/196499 (owner: 10Krinkle) [00:49:46] (03Merged) 10jenkins-bot: Actually commit mwext-VisualEditor-jsduck [integration/config] - 10https://gerrit.wikimedia.org/r/196499 (owner: 10Krinkle) [00:50:44] (03PS9) 10Legoktm: Add npm test for Citoid extension [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:52:37] (03CR) 10Legoktm: [C: 032] Add npm test for Citoid extension [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:58:02] (03Merged) 10jenkins-bot: Add npm test for Citoid extension [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:59:00] (03CR) 10Legoktm: Add npm test for Citoid extension (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/191063 (owner: 10Jforrester) [00:59:49] (03PS1) 10Legoktm: mwext-Citoid-lint --> phplint [integration/config] - 10https://gerrit.wikimedia.org/r/196505 [01:00:39] (03CR) 10Legoktm: [C: 032] mwext-Citoid-lint --> phplint [integration/config] - 10https://gerrit.wikimedia.org/r/196505 (owner: 10Legoktm) [01:01:48] (03Merged) 10jenkins-bot: mwext-Citoid-lint --> phplint [integration/config] - 10https://gerrit.wikimedia.org/r/196505 (owner: 10Legoktm) [01:03:15] !log deployed https://gerrit.wikimedia.org/r/191063 & https://gerrit.wikimedia.org/r/196505 [01:03:17] James_F: ^ [01:03:19] Logged the message, Master [01:09:31] ok, time to delete jobs now [01:28:56] legoktm: Thanks! [01:29:12] np [01:45:41] 27 Undefined index: 4 in /srv/mediawiki/php-1.25wmf20/includes/logging/BlockLogFormatter.php on line 57 [01:45:41] 27 Undefined index: 4 in /srv/mediawiki/php-1.25wmf20/includes/logging/BlockLogFormatter.php on line 55 [01:45:41] 2 Undefined index: user in /srv/mediawiki/php-1.25wmf20/includes/api/ApiFeedWatchlist.php on line 206 [01:45:41] 1 Undefined index: 5 in /srv/mediawiki/php-1.25wmf20/languages/Language.php on line 3459 [01:45:44] sigh [01:46:57] block log... probably related to the logging updates [01:46:59] that Language one is in listToText, would probably need to know the caller to track it down [01:48:56] feed watchlist... that's 'user' not defined, an API response property. maybe hitting a revdel'd entry? [01:49:19] !log deleted a bunch of unused *-tox-* jobs [01:49:24] Logged the message, Master [01:58:52] 10Continuous-Integration, 10OOjs: Publish QUnit coverage on integration.wikimedia.org - https://phabricator.wikimedia.org/T87490#1115600 (10Krinkle) In oojs, oojs-ui and VisualEditor, test coverage can be generated locally by running `npm install && npm test` in their directory and opening the coverage directo... [02:10:34] 10Continuous-Integration, 6Labs, 10OOjs, 10Wikimedia-Labs-Infrastructure, 6operations: Jenkins failing with "Error: GET https://saucelabs.com: Couldn't resolve host name." - https://phabricator.wikimedia.org/T92351#1115608 (10coren) The only net effect the change can make is that //iff// the fqdn has exa... [03:36:48] 10Deployment-Systems, 6Release-Engineering: Capture PHP warnings with stacktraces in MediaWiki and save to logstash - https://phabricator.wikimedia.org/T45086#1115667 (10bd808) [03:38:10] 10Deployment-Systems, 6Release-Engineering: Capture PHP warnings with stacktraces in MediaWiki and save to logstash - https://phabricator.wikimedia.org/T45086#479093 (10bd808) [03:47:22] 6Release-Engineering, 6MediaWiki-Core-Team, 10Wikimedia-Logstash, 7HHVM: Log php fatals with full backtraces again (fatal.log on fluorine) - https://phabricator.wikimedia.org/T89169#1115698 (10bd808) [05:00:34] Yippee, build fixed! [05:00:34] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #536: FIXED in 35 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/536/ [05:23:06] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #567: STILL FAILING in 37 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/567/ [05:24:20] Yippee, build fixed! [05:24:21] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce build #171: FIXED in 1 min 12 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce/171/ [05:31:42] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #574: STILL FAILING in 20 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/574/ [05:47:55] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #358: STILL FAILING in 47 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/358/ [05:53:43] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #603: STILL FAILING in 54 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/603/ [06:08:44] 10Continuous-Integration, 6MediaWiki-Core-Team, 7Composer: Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in - https://phabricator.wikimedia.org/T92605#1115821 (10Legoktm) 3NEW [06:24:52] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #518: FAILURE in 36 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/518/ [06:28:37] (03CR) 10Legoktm: "Not yet, that patch doesn't work properly yet and I haven't had much time to investigate why :/" [integration/config] - 10https://gerrit.wikimedia.org/r/191888 (owner: 10Werdna) [06:39:03] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #362: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/362/ [06:41:17] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [06:42:10] 10Beta-Cluster, 10Staging, 6Labs: Provide option to autosign puppet certs for self hosted puppetmasters - https://phabricator.wikimedia.org/T92606#1115838 (10yuvipanda) 3NEW [06:42:38] Yippee, build fixed! [06:42:39] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #512: FIXED in 16 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/512/ [06:52:46] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [07:06:21] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [07:11:21] 10Staging: Create staging cluster (tracking) - https://phabricator.wikimedia.org/T88702#1115871 (10yuvipanda) [07:11:23] 10Staging, 5Patch-For-Review: Setup staging-palladium as puppetmaster and saltmaster - https://phabricator.wikimedia.org/T88304#1115869 (10yuvipanda) 5Resolved>3Open Re-opening. Just realized palladium is trusty, should be precise... [07:26:05] (03PS1) 10Legoktm: Create generic 'npm' job [integration/config] - 10https://gerrit.wikimedia.org/r/196540 [08:06:03] (03CR) 1020after4: [C: 032] migrate-patch utility to retain security patches [tools/release] - 10https://gerrit.wikimedia.org/r/195942 (owner: 1020after4) [08:06:10] (03Merged) 10jenkins-bot: migrate-patch utility to retain security patches [tools/release] - 10https://gerrit.wikimedia.org/r/195942 (owner: 1020after4) [08:06:12] (03Merged) 10jenkins-bot: Fix branched sub-submodule support [tools/release] - 10https://gerrit.wikimedia.org/r/195972 (owner: 1020after4) [08:11:45] 6Release-Engineering, 10Wikimedia-Hackathon-2015: Release/QA tasks at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92565#1115914 (10Qgil) Great, thank you! I have added "MediaWiki releases" as a main area at https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2015 (feel free to propose an a... [08:23:50] 10Staging: Create staging-eventlogging - https://phabricator.wikimedia.org/T91561#1115927 (10mmodell) a:3mmodell [08:25:20] 10Staging: Create staging-eventlogging - https://phabricator.wikimedia.org/T91561#1115931 (10yuvipanda) This might be a bit hard now, because we don't have varnish set up yet. [08:57:28] 10Staging: Create staging-eventlogging - https://phabricator.wikimedia.org/T91561#1115973 (10mmodell) it's failing when salt is attempting to connect to tin. So should I set up varnish? [08:59:46] 10Staging: Create staging-eventlogging - https://phabricator.wikimedia.org/T91561#1115977 (10yuvipanda) yup, tin isn't set up yet either. I'd suggest redis? Current blockers, in order, are db / tin, memcached, redis, mediawiki. Tyler is on db, I'm doing tin, Chad just finished memcached... [09:02:58] 10Staging: Create staging-rdb (redis) - https://phabricator.wikimedia.org/T91547#1115991 (10mmodell) a:3mmodell [09:14:17] 10Staging: Create staging-rdb (redis) - https://phabricator.wikimedia.org/T91547#1116016 (10mmodell) OK that was too easy. Puppet ran and redis is listening on port 6379 :) [09:16:42] 10Staging: Create staging-rdb (redis) - https://phabricator.wikimedia.org/T91547#1089777 (10mmodell) ok I need to sleep. @yuvipanda: should I set up two redis nodes? one should be fine for now? [09:23:17] 10Staging: Create staging-rdb (redis) - https://phabricator.wikimedia.org/T91547#1116022 (10yuvipanda) Let's look at prod tomorrow and figure out. Ideally, yes, we will need two, one master and one slave. [09:51:17] zeljkof: parent ticket is https://phabricator.wikimedia.org/T89226 :D [09:52:45] https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/160/ took 4 hours [10:31:05] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering, 7Browser-Tests: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1116195 (10zeljkofilipin) 3NEW a:3zeljkofilipin [10:47:18] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering, 7Browser-Tests: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1116254 (10zeljkofilipin) Example Jenkins job: https://integration.wikimedia.org/ci/view/BrowserTests/view/-All/job/br... [11:29:00] hashar: on https://integration.wikimedia.org/ci/view/Beta/ appears to me that things are stuck [11:29:04] known issue? [11:29:58] Waiting for next available executor [12:09:30] aude: yeah that happens from time to time. [12:09:35] we have a task filled somewhere about it [12:19:35] aude: going to restart Jenkins :( [12:27:29] !log restarting Jenkins to remove some Beta jobs deadlock. Updated a few plugins as well. [12:37:35] PROBLEM - SSH on deployment-lucid-salt is CRITICAL: Connection refused [12:46:12] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:46:41] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:48:51] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [12:49:04] PROBLEM - Puppet failure on deployment-cache-mobile03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:49:04] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:50:14] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:08:03] 6Release-Engineering, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 7Browser-Tests: investigate failing browsertests on jenkins - https://phabricator.wikimedia.org/T92619#1116439 (10hashar) Removed #jenkins since that is for Jenkins itself. You want #Browser-Tests instead :) [13:11:28] hi ^d [13:11:32] I have for you today https://gerrit.wikimedia.org/r/#/c/196537/ [13:11:38] which will autosign puppet and salt keys for you :D [13:12:23] <^d> Dude with all the beer I owe you I might just have to buy you a keg [13:12:57] ^d: :D [13:13:18] ^d: I’m waiting for andrew to show up later so I can verify this won’t fuck up labs’ general puppetmaster, but otherwise should be merged in a few hours [13:13:30] ^d: and to think I’ll be in the office in a few weeks :D [13:13:45] <^d> sweeet [13:16:02] ^d: next thing I’m working on will also probably make you happy. when that’s done, you won’t actually have to use the wikitech interface! [13:16:16] <^d> What if I like it?!? [13:16:33] <^d> (said no one ever) [13:16:38] hehe [13:16:59] ^d: well, if you like it you can start an RfC and then start building a common.js based replacement... [13:17:22] <^d> I'd have to write JS? Pass [13:17:26] :P [13:25:51] 10Beta-Cluster, 10Staging, 6Labs, 5Patch-For-Review: Provide option to autosign puppet certs for self hosted puppetmasters - https://phabricator.wikimedia.org/T92606#1116500 (10scfc) [13:25:52] 10Beta-Cluster: Make beta cluster puppet master to auto sign client keys - https://phabricator.wikimedia.org/T75767#1116499 (10scfc) [13:26:35] 10Beta-Cluster: Make beta cluster puppet master to auto sign client keys - https://phabricator.wikimedia.org/T75767#781378 (10scfc) After T92606 is resolved, the class `puppetmaster::autosigner` needs to be enabled on the beta cluster puppet master. [13:59:48] 10Continuous-Integration, 10Quality-Assurance, 6Release-Engineering, 7Browser-Tests: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1116580 (10hashar) See P396 for the full Selenium log. I have looked at the log and identified the following sections... [14:04:15] ^d: I can help you brew a keg? [14:04:25] or could find a homebrew distributor in India [14:04:32] though, YuviPanda is moving to the US, like imminently [14:04:33] so there's that [14:04:42] :D [14:04:44] in like, 2 weeks. [14:09:09] hashar: back [14:09:21] <^d> werdna: No, I'm going to buy him a college-style cheap keg :p [14:09:32] <^d> And red solo cups [14:10:19] aude: the beta cluster jenkins jobs should be fixed hopefully [14:10:19] looks like it's updating again [14:13:07] ^d: aww yiss [14:18:48] (03CR) 10Hashar: "check experimental" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [14:20:11] (03CR) 10Hashar: "check experimental" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [14:29:23] https://phabricator.wikimedia.org/T65034 still appears to be an issue :( [14:37:40] ^d: YuviPanda where is a good place to find out what os versions various prod servers are running? Doesn't seem like I have shell access to to many :\ [14:38:27] <^d> That should be in puppet in the netboot stuff I thought? [14:38:33] * ^d has seen it before, he thought [14:39:09] <^d> modules/install-server/files/autoinstall/netboot.cfg [14:39:54] <^d> Or maybe not? Is that just partman? [14:40:07] good question. [14:40:16] I usually ssh in and check, but I guess you can’t do that…. [14:40:47] ^d: thcipriani can you log in to servermon.wikimedia.org? [14:40:54] hmm, I doubt it, actually. that’s probably restricted as wlel [14:41:23] YuviPanda: nope ^d yeah looks like lvm stuffs in netboot.cfg [14:41:26] <^d> I tried labs login info but no dice. [14:41:28] <^d> Must be ldap/ops [14:41:32] ^d: so only thing I can think of is… ask someone in ops... [14:41:47] (not very helpful, I know) [14:42:52] ^d: thcipriani actually [14:43:39] thcipriani: ^d https://github.com/wikimedia/operations-puppet/blob/production/modules/install-server/files/dhcpd/linux-host-entries.ttyS1-115200 [14:43:56] bah, only some of them are there. [14:44:31] <^d> yay consistency [14:45:06] heh, well, I know palladium is a dell now :) [14:49:44] :D [14:50:34] <^d> thcipriani: Almost everything is a Dell [14:51:57] (03CR) 10Hashar: [C: 032 V: 032] Apply wmf patches [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195279 (owner: 10Hashar) [14:52:29] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1116651 (10demon) >>! In T92564#1115364, @RobH wrote: > @demon: please sign https://phabricator.wikimedi... [14:55:44] 10Continuous-Integration, 6MediaWiki-Core-Team, 7Composer: Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in - https://phabricator.wikimedia.org/T92605#1116654 (10bd808) One approach would be to split the deployable vendor repo from the project repo and only co... [15:19:56] 10Continuous-Integration, 6operations, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1116695 (10hashar) I have further tweaked the package intended for precise-wikimedia. Patchset 9 of https://gerrit.wikimedia.... [15:20:44] 10Continuous-Integration, 6operations, 7Blocked-on-Operations, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1116708 (10hashar) I need to get the package reviewed by ops. [15:25:46] (03CR) 10Hashar: "Some of the dependencies are available in precise-wikimedia and are thus listed as dependencies of the package. The rest is fetched by dh-" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [15:52:27] ^d: thcipriani the autosigner has been merged :D [15:52:39] <^d> f yeah [15:52:50] YuviPanda: saw that [15:53:04] need to apply that role on. [15:54:05] YuviPanda: Also saw there's also an uncommited change on staging-palladium to role::puppet::self, should that be...anywhere? [15:54:19] thcipriani: nah, that was just testing. feel free to throw it out [15:54:41] kk, gearing up for new staging palladium, getting all patches in order [15:55:07] sweet [15:55:18] i’m going to continue working on the ENC [15:55:29] since I think that’ll make the pain of deleting / recreating instances a lot less [15:55:59] indeed. [15:56:09] thcipriani: did you ever manage to document the salt / puppetmaster per-project stuff you did? [15:56:22] I did... [15:56:28] ah, link? [15:56:30] * thcipriani looks for where I did [15:57:36] YuviPanda: looks like I got the project-wide puppetmaster documented: https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Set_up_project-wide_puppetmaster [15:57:41] 10Staging, 5Patch-For-Review: Setup staging-tin as deployment host - https://phabricator.wikimedia.org/T88442#1116811 (10demon) [15:57:42] 10Staging: Create staging-elastic* (ElasticSearch machines) - https://phabricator.wikimedia.org/T91552#1116810 (10demon) [15:57:52] <^d> Blocked until we have trebuchet [15:57:56] thcipriani: sweet. [15:58:06] thcipriani: aaaah, there’s a way to do auto updates as well :D and I never documented that... [15:58:07] let me document that [15:58:51] YuviPanda: here before EOD I want to get the whole, "Build you a staging" page started as well. [15:58:58] right :D [15:59:00] ok [16:02:00] zeljkof: let's cancel today [16:02:24] chrismcmahon: ok [16:02:40] zeljkof: sent you PM [16:03:11] chrismcmahon: got it :) [16:03:11] zeljkof: think about if there is anything we need to wrap up next week. [16:03:20] chrismcmahon: sure [16:03:48] the only thing that came to my mind so far is that we need to figure out how to transfer your master sauce labs account to me or somebody [16:04:33] zeljkof: actually, I think the way that works is that once those accounts are joined, they are equivalent. Let me find what I have from Sauce and send that to you today. [16:06:27] chrismcmahon: great, thanks [16:16:40] (semi-ontopic but not really) bd808: joeyh's "propellor" (his own config management thing written in haskell) does that "any host can update the others" thing. I don't want to learn haskell. [16:17:01] heh [16:17:20] joeyh is too smart for me [16:17:31] s/me/most/ [16:17:36] <^d> I thought about dusting off git-annex again for deployment stuff. [16:17:42] :D [16:18:02] <^d> You know he has bittorrent as a remote, right? [16:18:11] yep :) that was a semi-recent addition [16:18:47] I don't wanna learn haskell. I wanna _know_ haskell. [16:19:02] (but realistically, adding in rack/dc awareness on top of that might be harder than writting our own/using something else. See also: Haskell) [16:19:06] :) :) [16:19:58] * YuviPanda puts thcipriani in the matrix [16:20:06] "Type-safe deployment systems!" [16:20:07] although, all I got to was patternmatching and man is that sweet [16:20:11] <^d> greg-g: I don't think the hard part is there, I think the hard part is writing the announcer to be rack/dc aware [16:20:13] * thcipriani knows kung fu [16:20:23] <^d> So when a client requests other nodes to pull from, it only gets nearby ones [16:22:33] that shouldn't matter though. The fastest peer wins [16:22:42] oh, put the logic in the center instead of the edges [16:23:43] <^d> bd808: Oh hmm ok [16:33:30] bd808: is that all we should rely on though? What if a closer peer was momentarily slow so another rack/dc was picked? [16:33:51] I guess it evens out over time (they reassess fastest peer, right?). [16:33:59] YuviPanda: ^d twentyafterfour: killing staging-palladium. I think I have everything in line to bring it back up on precise pretty quickly, but I probably don't. [16:34:04] If chunk sizes are small then it should just-work-out [16:34:10] * greg-g nods [16:34:14] :) [16:34:41] but that's famous last words right [16:34:55] thcipriani: :D I think the problem would be that all of the other things’ puppet clients and salt minions would freak out [16:35:02] ‘yooooo wrrrong cert wtf' [16:35:33] once I finish the ENC, we can actually recreate them *all*. We’ll just need to destroy and re-create the instances in appropriate ordering and baaamm [16:35:34] yeah, the salt minions should be ok after a puppet run, may have to delete all the .pem file to get them to regen signing requests for puppet master though :\ [16:35:34] maybe [16:35:36] YuviPanda: we can just delete the certs on them manually, there aren't that many yet [16:35:42] yeah [16:35:46] yeah [16:35:49] it’s just a manual step [16:35:57] I should’ve checked [16:36:02] when I checked staging-palladium [17:06:54] thcipriani: how is it going? [17:07:24] almost there...but not _too_ terrible overall [17:09:46] YuviPanda: it got all kinds of weird on first boot, didn't have puppetmaster installed. The way I did it the first time was: 1. Run puppet agent 2. add puppet::self role 3. run puppet agent [17:10:02] right [17:10:05] couldn't do that this time since role::puppet::self was already assigned and set to the fqdn [17:19:31] YuviPanda: I think it's done. Got some odd notices in the puppet run though, no errors, mind taking a peek? [17:19:46] are they about ubuntu keys? :D [17:19:51] yeah [17:20:24] what's the deal with all that? [17:20:44] thcipriani: yeah, you can ignore them. rm -rf /etc/ssh/userkeys/ubuntu should fix things [17:20:50] we’ll fix that in the new labs image pretty soon... [17:20:59] kk, then done! [17:21:00] and I’ve fixed it for everything else via salt [17:21:12] thcipriani: that’s the unification of location of prod / beta keys that happened yesterday... [17:21:32] we changed the path in labs also from /etc/ssh/userkeys/%u/.ssh/authorized_keys to /etc/ssh/userkeys... [17:22:19] good to know. Kk, now on to figuring out how to update instances. [17:22:39] :D ok [17:37:29] 6Release-Engineering, 6MediaWiki-Core-Team, 10MediaWiki-Debug-Logging, 10Wikimedia-Logstash, 7HHVM: Log php fatals with full backtraces again (fatal.log on fluorine) - https://phabricator.wikimedia.org/T89169#1117046 (10Legoktm) [17:37:34] 10Deployment-Systems, 6Release-Engineering, 10MediaWiki-Debug-Logging: Capture PHP warnings with stacktraces in MediaWiki and save to logstash - https://phabricator.wikimedia.org/T45086#1117053 (10Legoktm) [18:06:50] Krinkle|detached: updated docs in https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Set_up_project-wide_puppetmaster if you want :) [18:09:24] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations, 5Patch-For-Review: Chad H. needs access to iridium (Phabricator host) to manage repos - https://phabricator.wikimedia.org/T92564#1117224 (10chasemp) https://gerrit.wikimedia.org/r/#/c/196613/ [18:20:59] YuviPanda: ^d twentyafterfour got autosigning on palladium, to point an existing instance at new puppet master do: find /var/lib/puppet/client/ssl -type f -delete; puppet agent -t may take a minute for new palladium to sign. [18:21:30] thcipriani: bam [18:21:35] Life is gooood [18:21:41] I'm eating food [18:21:44] I'll brb [18:21:57] I've a poc of the enc as well [18:23:40] 10Deployment-Systems, 5Patch-For-Review: l10nupdate user can't access scap shared ssh key causing nightly l10nupdate sync process to fail - https://phabricator.wikimedia.org/T76061#1117285 (10greg) How'd it go last night? [18:27:45] hmm I must have screwed up something .. staging-rdb doesn't want to connect .. I'm just gonna delete it and reimage - I need to make a staging-rdb01 and 02 anyway [18:31:53] "Failed to create instance. " [18:31:55] wtf [18:32:40] twentyafterfour: I wonder if we've hit an instance quota? [18:33:33] well the project doesn't have that many instances yet, not nearly as many as deployment-prep... [18:33:46] and it's lame that there is no useful error message [18:33:48] twentyafterfour: Instances: 9/10 should have one more... [18:34:12] we're going to need a lot more than 10 aren't we? [18:34:25] twentyafterfour: https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&action=displayquotas&projectname=staging but looks like we've hit our quota on Cores [18:36:41] so how do we increase the quota? [18:37:58] I think andrewbogott in ops had to increase it for security groups. We'll have to get an ops to do it. [18:38:20] trying to think of what to increase the quota _to_. [18:39:00] 10Quality-Assurance, 6Commons, 10MediaWiki-extensions-UploadWizard, 6Multimedia: UploadWizard API tests failing on beta Commons due to login problem - https://phabricator.wikimedia.org/T89272#1117359 (10Steinsplitter) [18:39:44] * greg-g raises pinky [18:39:51] thcipriani: 1 million instances [18:40:14] greg-g: seems legit. [18:40:24] worked for dr evil [18:40:47] as very few things did. [18:43:08] ok andrewbogott says stay tuned [18:48:11] ok he gave us a bit more... [18:49:27] twentyafterfour: saw that, that'll get us through today at least :) [18:55:28] is there not a separate puppet role for puppet slave? vs. master? [18:55:49] it’s role::puppet::self for everything now [18:55:52] but it’s all automated away now [18:55:58] and you just have to run puppet a few times [18:56:03] no need to set any puppetmaster related roles :D [18:56:04] twentyafterfour: ^ [18:56:41] right on [18:59:18] so where are we keeping hiera data, in wikitech or in ops/puppet.git ? [19:00:58] twentyafterfour: wikitech ideally. I have a few under /etc/puppet/hieradata/labs/staging/hosts/* but that's just so I don't have to reassign roles when I'm killing them and spinning back up. [19:01:49] sort of a stand in for the ENC that YuviPanda is working on [19:03:48] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #532: FAILURE in 30 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/532/ [19:12:06] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #425: FAILURE in 8 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/425/ [19:14:29] greg-g, twentyafterfour: I went ahead and declared on the l10nupdate saga [19:14:49] :D [19:14:52] bd808: nice [19:15:42] It ran last night so new issues are new issues [19:15:55] only took ... 4 months? Crap [19:16:30] The last part (changing the ssh remote user) was a total smack my forehead moment [19:16:52] Should have figured that out on day 1 [19:17:09] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 391 bytes in 0.002 second response time [19:18:01] is there a delay before changes to hiera data on wikitech take effect? [19:18:19] I've set up redis replication: [19:18:20] "role::db::redis::redis_replication": [19:18:22] staging-rdb2: staging-rdb1 [19:18:32] but no dice on staging-rdb2 [19:19:08] I think there is some cache at the puppetmaster [19:19:27] not long as I recall but like a minute or two? [19:20:29] well it's been at least a couple of minutes [19:20:44] twentyafterfour: I haven't noticed any delay... [19:20:55] then maybe I'm doing it rong [19:21:02] hmm ... [19:21:12] There's a stale? method in the provider -- https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/hiera/mwcache.rb#L53 [19:21:22] Yippee, build fixed! [19:21:22] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #522: FIXED in 1 hr 11 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/522/ [19:21:31] understanding what it does is an exercise for the reader [19:21:42] * bd808 slinks out for lunch [19:21:56] maybe it's getting overridden by production redis.yaml? [19:22:09] wasn't there an issue with ordering of those? [19:22:27] twentyafterfour: I don't see the change on here: https://wikitech.wikimedia.org/wiki/Hiera:Staging [19:22:52] oh shit! I've been editing deployment-prep :-d [19:23:06] doh [19:29:54] still no dice. [19:30:10] and hiera cli doesn't work: "Config file /etc/hiera.yaml not found" [19:30:43] ^demon|lunch: twentyafterfour thcipriani don’t blink yet, but https://gerrit.wikimedia.org/r/#/c/196628/3/nodes/labs/staging.yaml might end up being how we can specify a full staging environment node -> classes thingy in the future. [19:30:48] like, near future. [19:31:22] thcipriani: twentyafterfour afaik no delay [19:31:25] on hiera [19:31:31] and hiera cli has never worked :P [19:32:07] YuviPanda: RUBYLIB=/var/lib/puppet/lib hiera puppet_statsd::statsd_host ::instanceproject=staging ::hostname=$(hostname -s) --debug -c /etc/puppet/hiera.yaml [19:32:17] woooo [19:32:18] nice [19:32:21] hmm [19:32:30] I’m testing the ENC on deployment-prep now [19:33:37] that staging.yaml file is pretty awesome. [19:34:11] <^demon|lunch> Oh man, then all we'd need to do is create the node [19:34:28] ^demon|lunch: thcipriani yes [19:34:31] indeed. [19:34:46] because we have hiera, and then we have automated puppet cert signinga and salt signing as well [19:35:11] thcipriani: ^demon|lunch also, the word staging doesn’t actually appear on that ENC at all, so we can basically just spin up design-testing-* bam [19:35:30] thcipriani: ^demon|lunch and from here to BeCaaS is basically just plugging in the nova API to automate the creation of the nodes themselves :D [19:36:04] <^demon|lunch> wham bam done [19:36:29] thcipriani: thanks that hiera snippet helps... [19:36:54] still not getting redis slave to work though. the hiera value is getting set correctly on staging-palladium at least [19:37:39] twentyafterfour: redis doesn’t automatically restart on config changes, though. that’s by design [19:37:50] (same for dbs, memcached, etc - anything with state) [19:37:51] YuviPanda: but it's not changing the config [19:37:56] aaah [19:37:56] right [19:38:06] * YuviPanda shuts up for now, goes back to ENC [19:38:14] puppet run doesn't change anything and manual inspection of the redis.conf shows no "slaveof" lines [19:38:18] I can help in a bit if needed, but I think I’ll mostly test this ENC for a while and then go sleep... [19:38:31] YuviPanda: no problem, I'll figure it out [19:39:26] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [19:41:41] bd808: you rock [19:43:07] <^demon|lunch> greg-g: I'm poking at hiwiki FlaggedRevs bug finally [19:44:16] ah ha [19:44:37] roles/redisdb.pp has a conditional for labs ... [19:45:03] :D [19:45:08] shit like that is what we need to kill [19:45:14] err, ‘stuff' [19:45:19] * YuviPanda is trying to be more careful with words [19:45:34] what defines 'realm' [19:45:52] will this break other things if I kill the labs conditional ? [19:46:05] add me as a reviewer, and we’ll figure out [19:46:10] role::redis shouldn’t be used anywhere... [19:47:00] oh then maybe this isn't the problem? [19:47:14] well no it's class role::db::redis [19:47:49] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [19:50:36] deployment-salt is me [19:50:57] twentyafterfour: realm is set to ‘labs’ in labs and production elsewhere... [19:51:19] right [19:51:49] so basically, the current setup just breaks hiera for redis on labs with no useful differences [19:52:01] I'm submitting a patch to ops/puppet [19:57:29] and done [19:58:42] (03CR) 10BryanDavis: [C: 031] "I haven't run it but the change looks sane to me. Easiest way to test definitively is to cherry-pick to deployment-bastion:/srv/deployment" [tools/scap] - 10https://gerrit.wikimedia.org/r/196306 (https://phabricator.wikimedia.org/T92534) (owner: 10Legoktm) [19:58:53] twentyafterfour: nits [20:02:15] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [20:04:42] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #537: FAILURE in 37 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/537/ [20:07:45] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:29] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:11:13] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #174: FAILURE in 2 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/174/ [20:12:08] bd808: do I need to do anything besides cherry-picking the patch for testing? [20:12:54] I did the cherry-pick, "broke" wmf-config/filebackend-labs.php, tried syncing it and it let me [20:13:18] hmmm [20:14:11] I don't see a cherry-pick in deployment-bastion:/srv/deployment/scap/scap [20:14:39] you probably need to `git deploy start` before pulling it in [20:14:39] uhhhh wut. I definitely cherry-picked it... [20:14:52] legoktm@deployment-bastion:/srv/deployment/scap/scap$ git fetch https://gerrit.wikimedia.org/r/mediawiki/tools/scap refs/changes/06/196306/3 && git cherry-pick FETCH_HEAD [20:15:10] ok [20:15:16] so, [20:15:19] git deploy start [20:15:19] It looks like trebuchet reset the clone to the last deploy tag [20:15:22] git cherry-pick blah [20:15:27] git deploy finish? [20:15:49] git deploy abort I think [20:16:00] to reset to the last tag [20:16:10] but don't do that until after you test [20:16:31] <^demon|lunch> YuviPanda: I think I managed to hiera-ize most of the ES config [20:16:36] <^demon|lunch> patch incoming [20:16:37] ^demon|lunch: w00t. [20:16:39] nice [20:16:41] and git checkout master before the pick [20:17:19] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:43] <^demon|lunch> YuviPanda: https://gerrit.wikimedia.org/r/#/c/196640/ [20:18:10] bd808: ok, so I leave the git deploy in progress while testing scap? [20:18:30] legoktm: yeah. The change is only needed on the deploy master [20:18:33] ok it looks like redis is replicating [20:19:28] twentyafterfour: \o/ w00t [20:20:35] ^demon|lunch: nice. test on deployment-prep? [20:20:45] <^demon|lunch> Yeah I was gonna in a minute [20:21:09] bd808: ok, cherry-pick still in place and it still let me sync a bad file :/ [20:21:11] cool [20:21:36] ^demon|lunch: twentyafterfour thcipriani|afk so I’m cherry-picking the enc on staging-palladium, and going to test it on tin :D [20:21:37] so I'm thinking 500m was a stupid limit for redis memory - these instances have 4gb [20:21:41] legoktm: well then you have a bug ;) [20:21:48] YuviPanda: cool [20:21:58] bah [20:22:03] legoktm: maybe hot patch to add some debug messages so you can see what's wrong? [20:22:37] I have a vagrant environment for testing scap changes but it's not really sharable :( [20:22:58] It's old hacks. I failed to update it to 14.04 [20:23:11] scap-vagrant! [20:23:20] bd808: ok, and trebuchet won't override them? [20:23:24] My dir is called vagrant-scap :) [20:23:42] legoktm: nope. not as long as you have a deploy started [20:24:15] but your bad file could be nuked at any time by jenkins [20:24:27] hmm [20:24:38] so you may want to pause the update job too [20:25:06] after telling folks in this channel that you are doing so and !log'ing here [20:25:12] bd808: print() doesn't work for debugging? [20:25:34] it... should [20:25:37] I think [20:25:49] you can log to the logger [20:26:04] twentyafterfour: ^demon|lunch thcipriani|afk at some point, we should move all our node definitions to the yaml file :D [20:26:07] add --verbose to your sync-file call [20:26:18] <^demon|lunch> yes [20:26:46] * bd808 is jealous of all the puppet fun going on [20:26:50] bd808: --verbose didn't make it any more verbose [20:26:55] ^demon|lunch: and then we can recreate and delete at willll [20:27:27] ^demon|lunch: and then maybe, maybe some day in a glorious future, prod will also use the yaml files and site.pp will die for main mediawiki relaed stuff :) [20:27:29] legoktm: uh... it didn't? [20:27:36] and then we’ll have a truly similar environment [20:27:59] bd808: http://fpaste.org/197784/14262784/ [20:28:12] <^demon|lunch> Crap. [20:28:18] <^demon|lunch> Cannot reassign variable minimum_master_nodes [20:28:52] bd808: same output as without --verbose [20:29:37] oh.. it didn't get to the part that would actually be more verbose "called with an empty host list" [20:29:49] still not sure about your missing print though [20:30:18] did you just back it all out? [20:30:49] oh grr, it didn't save [20:31:39] wait [20:31:44] something aborted my deploy? [20:31:50] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:31:52] 0e5ada3 HEAD@{0}: checkout: moving from master to tags/scap/scap-sync-20150218-1 [20:32:00] yeah... [20:32:05] <^demon|lunch> -elastic05 is me [20:32:06] puppet madness maybe? [20:32:16] legoktm: I'll test in my test rig [20:32:32] ok thanks [20:35:17] bd808: shouldn’t have ben puppet madness, maybe git deploy? [20:35:19] some madness [20:35:33] thcipriani: once the ENC is merged, we should get rid of hiera_include(‘classes’) [20:35:44] YuviPanda: puppet triggering get-deploy to refresh would be my guess [20:35:48] YuviPanda: well that was short lived :) [20:35:51] ah, yeah. [20:36:00] thcipriani: :D indeed. the ENC won’t get merged until next week, though... [20:36:17] legoktm: ugh. This is going to be more complicated anyway... [20:36:35] greg-g: we’re getting closer to BeCaaS :) [20:36:38] legoktm: See https://github.com/wikimedia/operations-mediawiki-config/blob/master/hhvm-fatal-error.php [20:37:13] YuviPanda: ENC looks awesome. Soon it will become self-aware and we, in a panic, will try to pull the plug. [20:37:30] wikibugs is AWOL from most channels again. Currently only in #wikimedia-dev and #wikimedia-labs (afaict) [20:37:41] bd808: could we stick a at the top of that file? [20:37:45] thcipriani: if the singularity is written in ruby, I believe we have little to fear [20:37:46] :P [20:37:50] legoktm: *nod* [20:38:07] legoktm: Let's get the patch to work first :) [20:38:19] (Not sure which channel I'm meant to report wikibugs problems in. Here, or -labs, or other? Sorry/Thanks :) [20:38:24] thcipriani: actually, that joke doesn’t work because the ENC isn’t interfacing with any ruby at all except through syscalls, so ignore... [20:39:08] https://integration.wikimedia.org/zuul/ <-- stuck? [20:39:51] thcipriani: anyway, I need to clean up the code and shop it around to other opsen :) [20:40:34] YuviPanda: yay! [20:40:59] unstuck, yay [20:41:00] weird [20:41:49] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:52] YuviPanda: nice! [20:41:54] legoktm: fuuuuuu [20:42:03] sync-file doesn't call that method [20:42:09] it just calls php -l [20:42:29] sync-dir and scap use the method [20:42:43] so that's fix #1 [20:44:34] <^demon|lunch> YuviPanda: Hmm? [20:44:34] <^demon|lunch> Could not find class role::labs::instance for i-0000083a.eqiad.wmflabs on node i-0000083a.eqiad.wmflabs [20:44:45] ^demon|lunch: race. try again [20:45:37] >.> [20:46:38] bd808: ok, that should be easy to fix [20:46:56] bd808: is sync-dir working properly then? [20:46:58] <^demon|lunch> YuviPanda: PS3 works perfectly for deployment-elastic* [20:47:02] legoktm: Yeah I'll amend if I get it to work [20:47:03] <^demon|lunch> Who knows on prod :D [20:47:21] ^demon|lunch: :D do you know of the puppet compiler? :D [20:47:45] <^demon|lunch> i've heard of it, do I have to install something? [20:47:53] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:47:54] ^demon|lunch: nope [20:47:58] ^demon|lunch: docs at end of https://wikitech.wikimedia.org/wiki/Puppet_Testing [20:48:04] it’s pretty trivial [20:48:36] <^demon|lunch> ooooh [20:48:53] ^demon|lunch: btw, hieradata/role/common/elasticsearch/server.yaml will get applied in *prod only*. [20:48:57] and the common means eqiad and codfw [20:49:18] <^demon|lunch> Yes [20:49:23] ^demon|lunch: same path, replacing common with eqiad / codfw will let you do $::site specific stuff in hiera [20:49:56] <^demon|lunch> The expected nodes and such will probably have to change :) [20:50:57] <^demon|lunch> YuviPanda: So role/$site/.... or $site/....? [20:51:39] ^demon|lunch: ah, it should be just $site/, I think [20:51:45] ^demon|lunch: even the common one... [20:51:56] https://wikitech.wikimedia.org/wiki/Puppet_Hiera has docs [20:52:06] ^demon|lunch: actually, role/ woudl work as well... [20:52:10] * YuviPanda is unsure now on what to do [20:52:17] <^demon|lunch> It's too confusing! [20:52:21] <^demon|lunch> Too much duplication! [20:52:48] YuviPanda: ^demon|lunch twentyafterfour how are we handling keys that need to exist in puppet private? for instance DKIM keys for mx? It's weird these keys exist on deployment-mx... [20:53:39] (03CR) 10BryanDavis: [C: 04-1] "vagrant@scap:/srv/mediawiki-staging$ scap --verbose" [tools/scap] - 10https://gerrit.wikimedia.org/r/196306 (https://phabricator.wikimedia.org/T92534) (owner: 10Legoktm) [20:53:41] thcipriani: right. so if they can not be abused even if someone knows them unless that person also has access to shell in staging / prod, we just put them in the public labs/private repo in prod [20:53:59] thcipriani: if they need to be secret in some form, just put them as [LOCAL HACK] commits in /var/lib/git/labs/private in palladium [20:54:57] (03CR) 10BryanDavis: "Also sync-file doesn't use this code path, it use php -l directly. The easiest way to fix for that wold be to add `utils.check_php_opening" [tools/scap] - 10https://gerrit.wikimedia.org/r/196306 (https://phabricator.wikimedia.org/T92534) (owner: 10Legoktm) [20:55:14] ^demon|lunch: yeah, so you should use the one without the role. role only works when the role keyword is used in site.pp [20:55:20] and shouldn’t matter here... [20:55:23] I think [20:55:33] <^demon|lunch> We use the role keyword tho [20:55:43] ah [20:55:45] hmm [20:55:49] then maybe it’s ok... [20:55:59] I’ll have _joe_ take a look before pushing :D [20:56:07] put the site specific things in hiera too? [20:57:16] thcipriani: heh, now we have *three* ways of putting classes on nodes :D ENC, Hiera and Wikitech [20:57:53] and they're all combined! [20:59:21] yeah [20:59:33] so let’s include some some way and some the ottttheeerrway :D [20:59:56] thcipriani: actually, the LDAP terminus (wikitech) is not actually active on staging or deployment-prep atm [21:00:09] thcipriani: but the ENC combines the yaml file with LDAP / wikitech anyway, and hence wikitechstill works [21:02:16] it's just turtles all the ways down. [21:02:27] er yaml [21:03:42] <^demon|lunch> YuviPanda: Is there a way to see if there's any labs instances using a particular role? [21:04:02] ^demon|lunch: aaaah. so you can use semanticmediawiki, but that’s……………. [21:04:12] ^demon|lunch: but I have a magic incantation you can run on any commandline in labs! let me find it [21:05:37] ^demon|lunch: ldapsearch works pretty well: ldapsearch -LLL -x -D 'cn=proxyagent,ou=profile,dc=wikimedia,dc=org' -w $(grep -Po "(?<=bindpw).*" /etc/ldap.conf) -b 'ou=hosts,dc=wikimedia,dc=org' "puppetClass=[class]" [21:05:59] ^ yep [21:06:01] that one [21:06:08] I had to fish it out of my history... [21:07:08] <^demon|lunch> Sweeeet [21:07:19] <^demon|lunch> I can kill the $::elasticsearch_* variables and hieraize all that [21:07:32] <^demon|lunch> Only instances are deployment-prep [21:07:55] \o/ wonderful [21:12:25] Yippee, build fixed! [21:12:26] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #517: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/517/ [21:12:30] alright, I’m off to bed now [21:12:38] ^demon|lunch: \o/ thcipriani \o/ twentyafterfour \o/ [21:12:44] have a nice weekend guys [21:12:47] <^demon|lunch> g'night [21:12:56] (03PS4) 10Legoktm: check_php_syntax: Check for any content before opening (03CR) 10Legoktm: "PS4 fixes sync-file and doesn't error on empty files" [tools/scap] - 10https://gerrit.wikimedia.org/r/196306 (https://phabricator.wikimedia.org/T92534) (owner: 10Legoktm) [21:14:26] (03PS5) 10Legoktm: Check for any content before opening Yippee, build fixed! [21:32:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #519: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/519/ [21:42:05] (03CR) 1020after4: [C: 031] Check for any content before opening later all, have a good weekend [22:53:23] greg-g: have a good weekend! [23:13:21] PROBLEM - Host deployment-restbase01 is DOWN: CRITICAL - Host Unreachable (10.68.16.235) [23:41:54] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]