[00:16:18] <wikibugs>	 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10mmodell) Note that the rate limiting is disabled only temporarily. We need to remove the config from puppet to make it permanent.
[00:17:09] <wikibugs>	 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team, 10Wikidata: What happens if the Wikibase project specifies a version of a library outside of the range included in mediawiki-vendor? - https://phabricator.wikimedia.org/T179663 (10CCicalese_WMF)
[00:22:55] <wikibugs>	 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team, 10Operations, 10WMF-JobQueue, and 3 others: Collect error logs from jobchron/jobrunner services in Logstash - https://phabricator.wikimedia.org/T172479 (10CCicalese_WMF)
[00:26:39] <wikibugs>	 10Phabricator, 10Wikibugs, 10Patch-For-Review: wikibugs hits Phabricator's rate limiting and hence is unreliable - https://phabricator.wikimedia.org/T198915 (10Aklapper) That should obviously be a separate feature request instead.
[00:53:36] <wikibugs>	 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team: Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388 (10CCicalese_WMF)
[01:02:51] <wikibugs>	 10Scap, 10Core-Platform-Team: "sql wikishared" doesn't work on mwmaint1001 - https://phabricator.wikimedia.org/T199316 (10CCicalese_WMF)
[03:02:52] <legoktm>	 I don't think our tarballs are reproducible
[03:58:31] <Krinkle>	 legoktm: content wise, or are mtimes messing things up?
[03:58:58] <Krinkle>	 if mtimes, I suppose we could do something like use the timestamp of the last REL commit or some such, for all files recursively.
[03:59:57] <wikibugs>	 10Release-Engineering-Team (Next), 10Scap, 10WMF-JobQueue, 10Wikimedia-Incident: Add jobrunner servers to Scap canary process - https://phabricator.wikimedia.org/T172480 (10Krinkle)
[04:00:23] <wikibugs>	 10Release-Engineering-Team (Next), 10Scap, 10Core-Platform-Team, 10WMF-JobQueue, 10Wikimedia-Incident: Add jobrunner servers to Scap canary process - https://phabricator.wikimedia.org/T172480 (10Krinkle)
[04:08:03] <legoktm>	 Krinkle: content wise it all looks good. the file ordering is what I noticed when using diffoscope (which is system/fs dependent aiui)
[04:08:26] <legoktm>	 I think a sorted() in some place will easily fix it
[04:12:04] <wikibugs>	 10Release-Engineering-Team, 10Scap: When scap deploy is aborted it should say so in the log - https://phabricator.wikimedia.org/T199388 (10Krinkle)
[04:15:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krinkle) p:05Triage>03High
[04:15:38] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krinkle) It seems Let's Encrypt now publicly supports wildcard domains. Are there known blockers to using those?
[04:15:52] <Krinkle>	 legoktm: Ah, sounds good. Easy fix, or a task?
[04:16:05] <legoktm>	 yeah, I'm writing up some notes
[04:29:07] <wikibugs>	 10MediaWiki-Releasing, 10Security: Streamline MW security release process - https://phabricator.wikimedia.org/T196602 (10Legoktm) git-archive-all looks extremely promising from my testing so far. I was able to recreate a functionally identical tarball from the 1.31.0 git tag. I'll split my notes/suggestions w/...
[05:48:56] <legoktm>	 greg-g: are we not pausing deploys for Wikimania next week?
[06:27:52] <wikibugs>	 10Phabricator: Enable embedding of images from Wikimedia Commons - https://phabricator.wikimedia.org/T199407 (10Tbayer)
[06:29:31] <wikibugs>	 10Phabricator: Enable image hotlinking - https://phabricator.wikimedia.org/T186246 (10Tbayer)
[06:29:34] <wikibugs>	 10Phabricator: Enable embedding of images from Wikimedia Commons - https://phabricator.wikimedia.org/T199407 (10Tbayer)
[06:30:11] <wikibugs>	 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10Tbayer) This task was originally mainly about images, but then morphed into a task about videos and was closed as such. I have split the image part off into...
[06:30:24] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope/confusion between #Possible-Tech-Projects and #Outreach-Programs-Projects tags - https://phabricator.wikimedia.org/T198101 (10srishakatux) @Aklapper #possible-tech-projects is ready to be 🗡and #outreach-programs-projects ready for a name cha...
[06:32:41] <wikibugs>	 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10Tbayer) BTW, the example video above in T116515#3309596 doesn't work for me right now, in either Chromium ("Requests to the server have been blocked by an ex...
[07:18:36] <wikibugs>	 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Pginer-WMF) >>! In T198974#4410522, @Josve05a wrote: > I got this right now as well. I was creating a task with exception crash code from AWB. I then pressed the save/create task bu...
[07:18:42] <greg-g>	 legoktm: no, all of SRE and RelEng will be around. We should be fine as long as nothing is deployed without being scheduled.
[07:39:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krenair) No I just need to puppetise my work to put them into use, see above
[08:36:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[08:48:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[08:56:43] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:04:49] <wikibugs>	 (03PS1) 10Hashar: Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512)
[09:04:51] <wikibugs>	 (03CR) 10Hashar: [C: 032] Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar)
[09:05:30] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:06:09] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar)
[09:23:25] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:29:59] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:34:34] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:41:48] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:45:49] <wikibugs>	 (03PS1) 10Hashar: Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417)
[09:46:25] <wikibugs>	 (03CR) 10Hashar: [C: 032] Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417) (owner: 10Hashar)
[09:47:29] <wikibugs>	 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar)
[09:47:57] <wikibugs>	 (03Merged) 10jenkins-bot: Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417) (owner: 10Hashar)
[10:15:42] <wikibugs>	 (03PS1) 10Hashar: Inject tests/selenium/LocalSettings.php [integration/quibble] - 10https://gerrit.wikimedia.org/r/445380 (https://phabricator.wikimedia.org/T198201)
[10:16:31] <wikibugs>	 (03CR) 10Hashar: "Gotta try it out with a few extensions that rely on that." [integration/quibble] - 10https://gerrit.wikimedia.org/r/445380 (https://phabricator.wikimedia.org/T198201) (owner: 10Hashar)
[10:18:44] <addshore>	 hashar: is there a list of require-dev extensions that are loaded for wmf ci?
[10:18:49] * addshore needs to add one for Wikibase
[11:07:36] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[11:14:31] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10Pablo-WMDE) **Failing job** https://integration.wikimedia.org/ci/job/qu...
[11:28:11] <wikibugs>	 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10mmodell) maybe we need to adjust content security policy?
[13:34:00] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata, 10User-Addshore: Install cache/integration-tests with Wikibase CI tests - https://phabricator.wikimedia.org/T199440 (10Addshore)
[13:50:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic09 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:38:04] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:38:14] <wikibugs>	 10Project-Admins, 10Pywikibot-core, 10Pywikibot-RfCs: Shall we rename Pywikibot-core Phabricator project to Pywikibot? - https://phabricator.wikimedia.org/T195893 (10Dvorapa) Also the PWBC should be changed maybe to PWB
[14:42:53] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 47285 bytes in 1.697 second response time
[15:16:40] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar) @Pablo-WMDE for #1 and #2  that is `User should be able to chan...
[15:16:56] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar)
[16:01:04] <wikibugs>	 10Release-Engineering-Team (Watching / External), 10GitHub-Mirrors, 10Security-Team: Enforce 2FA for GitHub members - https://phabricator.wikimedia.org/T198810 (10mmodell)
[16:01:06] <wikibugs>	 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) 05Open>03Resolved a:03mmodell I've added 2factor to wmfphab on github
[16:07:32] <apergos>	 what is the status of wmf.12 going out to group2 wikis today? I thought it was scheduled but I don't see an actual deploy in sal
[16:07:57] <apergos>	 I'm asking because if it's there I can test something, otherwise i work on something else
[16:08:25] <Reedy>	 It's done, I think
[16:08:25] <Reedy>	 https://tools.wmflabs.org/versions/
[16:08:34] <Reedy>	 Yeah, it's everywhere
[16:08:49] <apergos>	 oh good!  thanks
[16:09:04] <apergos>	 I searched around in sal but couldn't find any 'start deploy' that looked like it
[16:09:20] <Reedy>	 2018-07-12	13:14	<zfilipin@deploy1001>	rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.12
[16:32:24] <wikibugs>	 10Gerrit, 10Release-Engineering-Team: Update gerrit to 2.15.3 - https://phabricator.wikimedia.org/T199460 (10Paladox)
[16:50:50] <wikibugs>	 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10dchan)
[16:54:08] <wikibugs>	 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10Paladox) Did you enable "Fit to screen" in the preference?
[17:19:04] <wikibugs>	 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10dchan) Thanks, that solves it for me - is there any way that can be the default (at least for Unified)?
[17:19:53] <wikibugs>	 10Gerrit: No keyboard shortcut for adding reviewers in PolyGerrit - https://phabricator.wikimedia.org/T199463 (10Paladox) filled upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=9422
[17:20:27] <wikibugs>	 10Gerrit, 10Upstream: No keyboard shortcut for adding reviewers in PolyGerrit - https://phabricator.wikimedia.org/T199463 (10Paladox)
[17:27:12] <greg-g>	 apergos: welcome to the new (sometimes) normal of early trains! :)
[17:29:39] <wikibugs>	 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10Paladox) Filled that part upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=9423
[17:34:22] <apergos>	 I looked at the schedule and had my eye open for the deploy
[17:34:27] <apergos>	 I just didn't see it go by :-)
[17:34:45] <apergos>	 this means tomorrow I can run some tests for someone
[17:34:54] <greg-g>	 :)
[17:34:55] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Disk usage on deployment-kafa-jumbo-* causing alerts - https://phabricator.wikimedia.org/T198262 (10Nuria) 05Open>03Resolved
[17:34:58] <apergos>	 if they pass i will be able to make a customer happy
[17:46:39] <wikibugs>	 (03PS9) 10Thcipriani: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall)
[17:46:41] <wikibugs>	 (03PS2) 10Thcipriani: Pipeline: helm deployment to CI namespace [integration/config] - 10https://gerrit.wikimedia.org/r/444647
[17:48:11] <wikibugs>	 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review, 10User-notice: Make PolyGerrit the default ui - https://phabricator.wikimedia.org/T196812 (10Framawiki) >>! In T196812#4401708, @Johan wrote: > OK, I simplified a bit and geared it a bit more towards the main audience of Tech News (non-developers),...
[18:07:06] <wikibugs>	 (03PS3) 10Thcipriani: Pipeline: helm deployment to CI namespace [integration/config] - 10https://gerrit.wikimedia.org/r/444647
[18:15:22] <wikibugs>	 10MediaWiki-Releasing: Move to a `git archive` like model for MediaWiki tarball creation - https://phabricator.wikimedia.org/T199467 (10Legoktm)
[18:25:57] <wikibugs>	 10Release-Engineering-Team (Next), 10Release Pipeline (Blubber): Blubber should support php/composer - https://phabricator.wikimedia.org/T186547 (10greg)
[18:26:09] <apergos>	 SMalyshev: you are very fast
[18:26:30] <apergos>	 however I did just earlier make sure the code was deployed to all wikis today
[18:26:42] <apergos>	 since it is the evening I will be looking at this tomorrow or mondy
[18:26:51] <apergos>	 (see the backread in this very channel!)
[18:31:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next): zuul-merger /var/log/git-daemon/syslog.log is not log rotated - https://phabricator.wikimedia.org/T188107 (10greg)
[18:37:18] <SMalyshev>	 apergos: thanks!
[18:37:41] <apergos>	 sure! 
[19:04:16] <mdholloway>	 !log deployment-prep: deleting new instance deployment-maps04 (initial puppet run failed) and creating deployment-maps05
[19:04:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[19:08:55] <shinken-wm>	 PROBLEM - Host deployment-maps04 is DOWN: CRITICAL - Host Unreachable (10.68.18.184)
[19:13:49] <addshore>	 Hi greg-g 
[19:14:01] <addshore>	 Are you the one with access to the gcal for deploy slots?
[19:14:09] <addshore>	 Because it needs updating ;)
[19:56:38] <paladox>	 TOO MANY CONCURRENT CONNECTIONS
[19:56:39] <paladox>	 You ("2a00:23c4:ad0a:7d01:2c7c:f6c9:25a8:19c0") have too many concurrent connections.
[19:56:42] <paladox>	 hmm i just got that
[19:56:44] <paladox>	 twentyafterfour ^^
[20:38:50] <shinken-wm>	 PROBLEM - SSH on integration-slave-docker-1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:48:40] <shinken-wm>	 RECOVERY - SSH on integration-slave-docker-1017 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0)
[21:06:49] <greg-g>	 addshore: yeah, heh, will do. I'm afk for a while this afternoon.
[21:13:51] <wikibugs>	 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10Krenair) @mmodell, doesn't {K13} need to be updated?
[21:14:53] <wikibugs>	 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) The password hasn't changed.
[21:18:26] <wikibugs>	 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10Krenair) But you added MFA, doesn't a token now need to be used when Phab pushes using that username to Github over HTTPS?
[21:18:49] <Krenair>	 mdholloway, hi
[21:22:02] <mdholloway>	 Krenair: hello
[21:22:14] <Krenair>	 mdholloway, I see there's a deployment-maps05 instance now
[21:22:52] <Krenair>	 mdholloway, have you been able to log into it?
[21:24:06] <mdholloway>	 yes, and no, i haven't been able to log into it. i couldn't log into deployment-maps04, so ended up destroying it and starting afresh. same issue on -05 but i see what's going on now from the log.  i emailed catrope about what to do about it. 
[21:24:42] <mdholloway>	 (since he set up deployment-maps03)
[21:26:10] <Krenair>	 did you try to add a class before you got logged on and puppet fixed?
[21:27:45] <mdholloway>	 no, the only change i had made to -04 was adding the 'maps' security group.  i haven't touched -05.  it is inheriting quite a bit of config, though: https://tools.wmflabs.org/openstack-browser/server/deployment-maps05.deployment-prep.eqiad.wmflabs
[21:28:01] <Krenair>	 ah
[21:28:35] <Krenair>	 sounds like it inherited a class but not the required config for it
[21:29:57] <mdholloway>	 i think that's right.  it wants some credentials it's not finding, and i don't know how that was set up
[21:30:31] <Krenair>	 oh it's possible this stuff only comes from the deployment-puppetmaster which it won't be able to access yet
[21:33:00] <Krenair>	 yeah here we go
[21:33:17] <Krenair>	 local cherry-picks on the deployment-puppetmaster providing that data
[21:34:12] <mdholloway>	 ah, i see
[21:34:42] <Krenair>	 I'll just transfer the -maps prefix stuff to -maps03 for now to get this new one bootstrapped
[21:41:49] <Krenair>	 we probably shouldn't use classes under prefixes due to our project puppetmaster :/
[21:44:41] <shinken-wm>	 PROBLEM - Host deployment-maps05 is DOWN: PING CRITICAL - Packet loss = 100%
[21:45:26] <Krenair>	 ^ I got rid of the bricked instance
[21:46:26] <Krenair>	 bah I forgot the horizon puppet thing messes with types
[21:47:23] <shinken-wm>	 PROBLEM - Puppet errors on deployment-maps03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:48:48] <Krenair>	 well now I really broke it :|
[21:49:53] <Krenair>	 HTTP 500s from the hiera editing thing
[21:50:36] <mdholloway>	 uh oh
[21:51:37] <Krenair>	 looks like it didn't like my:
[21:51:39] <Krenair>	 profile::redis::master::clients:
[21:51:39] <Krenair>	 - %{::fqdn}
[21:54:22] <Krenair>	 due to lack of quotes around %{::fqdn}
[21:57:54] <Krenair>	 okay so -maps03 puppet is successfully running with the classes moved from -maps prefix to the -maps03 instance, and the profile::redis::master parameters moved into hiera (due to type weirdness)
[21:58:52] <Krenair>	 now I'm going to make deployment-maps04 (since the previous incarnation was inaccessible this is no big deal) and hopefully it will be accessible, I can get puppet working and then add the maps stuff
[21:59:15] <Krenair>	 mdholloway, uh, what flavour was your one?
[21:59:17] <Krenair>	 m1.large?
[21:59:30] <mdholloway>	 yes, that's right
[21:59:39] <mdholloway>	 thank you!
[22:00:17] <mdholloway>	 i have had some trouble in the past with destroying and re-creating instances with identical names, but it seems sporadic
[22:02:16] <Krenair>	 mdholloway, I assume stretch rather than jessie?
[22:03:12] <mdholloway>	 yes, stretch
[22:04:17] <Krenair>	 we'll see about the re-creating thing
[22:04:24] <Krenair>	 should fail pretty loudly if anything in that area breaks
[22:05:31] <Krenair>	 we have had problems in the past with instance's DNS entries not getting cleaned up properly on deletion (mostly the reverse PTRs but I think there may have been cases of the A records too)
[22:07:07] <Krenair>	 alright LDAP isn't up on the new instance yet but I can get in as root
[22:07:17] <Krenair>	 and there we go, that works too
[22:07:24] <shinken-wm>	 RECOVERY - Puppet errors on deployment-maps03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:28] <Krenair>	 so deployment-maps04 is running
[22:07:55] <Krenair>	 when running puppet of course it runs into problems with the new puppetmaster
[22:09:07] <mdholloway>	 great!  i'm in, which is a step in the right direction.
[22:09:44] <Krenair>	 so I'm running the usual stuff to get it working with our puppetmaster
[22:10:28] <wikibugs>	 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) I created an access token and replaced the stored password with the access token.
[22:11:15] <Krenair>	 the commands for which I dumped at the bottom of the description in https://phabricator.wikimedia.org/T195686 and probably documented somewhere hopefully
[22:11:58] <Krenair>	 puppet has completed successfully on the new host, so
[22:12:11] <Krenair>	 now we just copy the classes from -maps03
[22:15:23] <mdholloway>	 it'll need the maps security group as well — not sure if it'll cause puppet trouble if the maps and cassandra classes without it, but it might
[22:15:51] <mdholloway>	 if the maps and cassandra classes *are applied* without it, that is
[22:16:47] <Krenair>	 alright I've added that and copied the classes across
[22:16:51] <Krenair>	 let's see what puppet thinks
[22:17:06] <Krenair>	 oh I didn't copy profile::cassandra::single_instance::jmx_exporter_enabled: false
[22:25:02] <Krenair>	 mdholloway, so
[22:25:06] <Krenair>	 it did a ton of stuff
[22:25:10] <Krenair>	 and there were some problems
[22:25:13] <Krenair>	 Error: Could not find group maps-admins
[22:25:13] <Krenair>	 Error: /Stage[main]/Profile::Maps::Postgresql_common/File[/var/log/postgresql/postgresql-9.6-main.log]/group: change from adm to maps-admins failed: Could not find group maps-admins
[22:25:21] <Krenair>	 sounds like it expects the admin module to be in place
[22:25:44] <Krenair>	 E: Version '2.2.6-wmf5' for 'cassandra' was not found
[22:25:45] <Krenair>	 Notice: /Stage[main]/Cassandra/Package[cassandra-tools-wmf]: Dependency Package[cassandra] has failures: true
[22:25:48] <Krenair>	 probably stretch vs. jessie?
[22:27:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-maps04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[22:29:37] <mdholloway>	 hmm, yeah, the cassandra package issue sounds like stretch vs. jessie
[22:29:38] <Krenair>	 some more failures that are probably due to that one
[22:29:51] <awight>	 greg-g: I've been meaning to ask you about an overlap between deployment windows: https://wikitech.wikimedia.org/wiki/Talk:Deployments
[22:30:12] <Krenair>	 root@deployment-maps04:/var/lib/puppet# apt-cache policy cassandra
[22:30:12] <Krenair>	 cassandra:
[22:30:12] <Krenair>	   Installed: (none)
[22:30:12] <Krenair>	   Candidate: 2.1.13
[22:30:13] <Krenair>	   Version table:
[22:30:13] <Krenair>	      2.1.13 1001
[22:30:15] <Krenair>	        1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 Packages
[22:30:48] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani)
[22:30:50] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build base image for math extension pipeline tests - https://phabricator.wikimedia.org/T196939 (10thcipriani) 05Open>03declined Declining this for now, will revisit in future quarters
[22:31:31] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani) 05Open>03declined Will revisit in future quarters
[22:31:33] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Get MediaWiki running in Docker with Blubber - https://phabricator.wikimedia.org/T187105 (10thcipriani)
[22:31:35] <Krenair>	 jessie has that but it also has 2.2.6-wmf1 and 2.2.6-wmf3
[22:31:38] <Krenair>	 presumably for a reason
[22:31:44] <Krenair>	 so I think we should wait for ops to package it for stretch
[22:32:35] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani)
[22:32:37] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release Pipeline: Create integration/config pipeline job to create runnable MediaWiki + Math image - https://phabricator.wikimedia.org/T196944 (10thcipriani) 05Open>03declined Will revisit in future quarters
[22:34:24] <mdholloway>	 Krenair: here's the output on maps-test2004 (running stretch):
[22:34:31] <mdholloway>	 mholloway-shell@maps-test2004:~$ apt-cache policy cassandra
[22:34:31] <mdholloway>	 cassandra:
[22:34:31] <mdholloway>	   Installed: 2.2.6-wmf5
[22:34:31] <mdholloway>	   Candidate: 2.2.6-wmf5
[22:34:31] <mdholloway>	   Version table:
[22:34:31] <mdholloway>	  *** 2.2.6-wmf5 1001
[22:34:31] <mdholloway>	        1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/component/cassandra22 amd64 Packages
[22:34:32] <mdholloway>	         100 /var/lib/dpkg/status
[22:34:32] <mdholloway>	      2.1.13 1001
[22:34:33] <mdholloway>	        1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 Packages
[22:34:45] <Krenair>	 interesting
[22:37:13] <Krenair>	 modules/cassandra/manifests/init.pp has
[22:37:21] <Krenair>	         '2.2' => os_version('debian >= stretch') ? {
[22:37:21] <Krenair>	             true  => hiera('cassandra::version', '2.2.6-wmf5'),
[22:37:21] <Krenair>	             false => hiera('cassandra::version', '2.2.6-wmf3'), },
[22:38:13] <Krenair>	 I wonder what causes it to get the stretch-wikimedia/component/cassandra22
[22:38:30] <Krenair>	 oh theoretically the puppet thing below
[22:38:31] <Krenair>	 so why hasn't it
[22:38:46] <Krenair>	 root@deployment-maps04:/var/lib/puppet# cat /etc/apt/sources.list.d/wikimedia-cassandra22.list 
[22:38:46] <Krenair>	 deb http://apt.wikimedia.org/wikimedia stretch-wikimedia component/cassandra22
[22:39:09] <Krenair>	 oh there we go
[22:39:14] <Krenair>	 just needed an apt-get update
[22:39:21] <Krenair>	   Candidate: 2.2.6-wmf5
[22:39:26] <mdholloway>	 ah, nice
[22:39:33] <Krenair>	 so back to puppet
[22:40:49] <Krenair>	 progress
[22:40:51] <Krenair>	 not perfect yet
[22:41:35] <Krenair>	 Error: Could not find group maps-admins
[22:41:35] <Krenair>	 Error: /Stage[main]/Profile::Maps::Postgresql_common/File[/var/log/postgresql/postgresql-9.6-main.log]/group: change from adm to maps-admins failed: Could not find group maps-admins
[22:41:40] <Krenair>	 so why does it work on -maps03
[22:41:47] <Krenair>	 has someone got an unpuppetised group on there?
[22:42:39] <Krenair>	 yeah
[22:42:45] <Krenair>	 krenair@deployment-maps03:~$ getent group maps-admins
[22:42:45] <Krenair>	 maps-admins:x:1000:
[22:42:49] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline: Helm test failing for CI namespace - https://phabricator.wikimedia.org/T199489 (10thcipriani)
[22:42:53] <Krenair>	 root@deployment-maps04:/var/lib/puppet# getent group maps-admins
[22:42:53] <Krenair>	 root@deployment-maps04:/var/lib/puppet# 
[22:42:55] * mdholloway isn't entirely surprised
[22:42:59] <Krenair>	 think someone made it locally
[22:43:19] <Krenair>	 in prod this would be an admin module dependency but we don't use that in labs
[22:45:16] <Krenair>	 !log deployment-maps04 groupadd -g 1000 maps-admins
[22:45:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[22:45:41] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline: Helm test failing for CI namespace - https://phabricator.wikimedia.org/T199489 (10thcipriani)
[22:46:13] <Krenair>	 more progress
[22:46:15] <Krenair>	 different problems
[22:46:39] <Krenair>	 Notice: /Stage[main]/Postgresql::Server/Exec[pgreload]/returns: pg_ctl: directory "/srv/postgresql/9.6/main" is not a database cluster directory
[22:46:39] <Krenair>	 Error: /Stage[main]/Postgresql::Server/Exec[pgreload]: Failed to call refresh: /usr/bin/pg_ctlcluster 9.6 main reload returned 1 instead of one of [0]
[22:46:39] <Krenair>	 Error: /Stage[main]/Postgresql::Server/Exec[pgreload]: /usr/bin/pg_ctlcluster 9.6 main reload returned 1 instead of one of [0]
[22:47:19] <Krenair>	 well it works now
[22:48:05] <Krenair>	 mdholloway, take a look
[22:48:47] <mdholloway>	 looking...
[22:57:08] <mdholloway>	 Krenair: well, services are failing to start up for their own reasons, but i can ssh in, and puppet runs cleanly
[22:57:29] <mdholloway>	 thanks so much for your help!
[22:57:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-maps04 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:58:38] <Krenair>	 np
[22:58:48] <Krenair>	 this stuff is a pain but I've been dealing with it for the past 3-4 years, so
[23:55:59] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Set up puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792 (10bd808)