[00:16:18] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10mmodell) Note that the rate limiting is disabled only temporarily. We need to remove the config from puppet to make it permanent. [00:17:09] 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team, 10Wikidata: What happens if the Wikibase project specifies a version of a library outside of the range included in mediawiki-vendor? - https://phabricator.wikimedia.org/T179663 (10CCicalese_WMF) [00:22:55] 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team, 10Operations, 10WMF-JobQueue, and 3 others: Collect error logs from jobchron/jobrunner services in Logstash - https://phabricator.wikimedia.org/T172479 (10CCicalese_WMF) [00:26:39] 10Phabricator, 10Wikibugs, 10Patch-For-Review: wikibugs hits Phabricator's rate limiting and hence is unreliable - https://phabricator.wikimedia.org/T198915 (10Aklapper) That should obviously be a separate feature request instead. [00:53:36] 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team: Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388 (10CCicalese_WMF) [01:02:51] 10Scap, 10Core-Platform-Team: "sql wikishared" doesn't work on mwmaint1001 - https://phabricator.wikimedia.org/T199316 (10CCicalese_WMF) [03:02:52] I don't think our tarballs are reproducible [03:58:31] legoktm: content wise, or are mtimes messing things up? [03:58:58] if mtimes, I suppose we could do something like use the timestamp of the last REL commit or some such, for all files recursively. [03:59:57] 10Release-Engineering-Team (Next), 10Scap, 10WMF-JobQueue, 10Wikimedia-Incident: Add jobrunner servers to Scap canary process - https://phabricator.wikimedia.org/T172480 (10Krinkle) [04:00:23] 10Release-Engineering-Team (Next), 10Scap, 10Core-Platform-Team, 10WMF-JobQueue, 10Wikimedia-Incident: Add jobrunner servers to Scap canary process - https://phabricator.wikimedia.org/T172480 (10Krinkle) [04:08:03] Krinkle: content wise it all looks good. the file ordering is what I noticed when using diffoscope (which is system/fs dependent aiui) [04:08:26] I think a sorted() in some place will easily fix it [04:12:04] 10Release-Engineering-Team, 10Scap: When scap deploy is aborted it should say so in the log - https://phabricator.wikimedia.org/T199388 (10Krinkle) [04:15:08] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krinkle) p:05Triage>03High [04:15:38] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krinkle) It seems Let's Encrypt now publicly supports wildcard domains. Are there known blockers to using those? [04:15:52] legoktm: Ah, sounds good. Easy fix, or a task? [04:16:05] yeah, I'm writing up some notes [04:29:07] 10MediaWiki-Releasing, 10Security: Streamline MW security release process - https://phabricator.wikimedia.org/T196602 (10Legoktm) git-archive-all looks extremely promising from my testing so far. I was able to recreate a functionally identical tarball from the 1.31.0 git tag. I'll split my notes/suggestions w/... [05:48:56] greg-g: are we not pausing deploys for Wikimania next week? [06:27:52] 10Phabricator: Enable embedding of images from Wikimedia Commons - https://phabricator.wikimedia.org/T199407 (10Tbayer) [06:29:31] 10Phabricator: Enable image hotlinking - https://phabricator.wikimedia.org/T186246 (10Tbayer) [06:29:34] 10Phabricator: Enable embedding of images from Wikimedia Commons - https://phabricator.wikimedia.org/T199407 (10Tbayer) [06:30:11] 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10Tbayer) This task was originally mainly about images, but then morphed into a task about videos and was closed as such. I have split the image part off into... [06:30:24] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope/confusion between #Possible-Tech-Projects and #Outreach-Programs-Projects tags - https://phabricator.wikimedia.org/T198101 (10srishakatux) @Aklapper #possible-tech-projects is ready to be 🗡and #outreach-programs-projects ready for a name cha... [06:32:41] 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10Tbayer) BTW, the example video above in T116515#3309596 doesn't work for me right now, in either Chromium ("Requests to the server have been blocked by an ex... [07:18:36] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Pginer-WMF) >>! In T198974#4410522, @Josve05a wrote: > I got this right now as well. I was creating a task with exception crash code from AWB. I then pressed the save/create task bu... [07:18:42] legoktm: no, all of SRE and RelEng will be around. We should be fine as long as nothing is deployed without being scheduled. [07:39:50] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krenair) No I just need to puppetise my work to put them into use, see above [08:36:47] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:48:17] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:56:43] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:04:49] (03PS1) 10Hashar: Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512) [09:04:51] (03CR) 10Hashar: [C: 032] Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [09:05:30] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:06:09] (03Merged) 10jenkins-bot: Migrate SemanticDrilldown to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/445368 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [09:23:25] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:29:59] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:34:34] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:41:48] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:45:49] (03PS1) 10Hashar: Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417) [09:46:25] (03CR) 10Hashar: [C: 032] Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417) (owner: 10Hashar) [09:47:29] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [09:47:57] (03Merged) 10jenkins-bot: Archive some Semantic MediaWiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/445378 (https://phabricator.wikimedia.org/T199417) (owner: 10Hashar) [10:15:42] (03PS1) 10Hashar: Inject tests/selenium/LocalSettings.php [integration/quibble] - 10https://gerrit.wikimedia.org/r/445380 (https://phabricator.wikimedia.org/T198201) [10:16:31] (03CR) 10Hashar: "Gotta try it out with a few extensions that rely on that." [integration/quibble] - 10https://gerrit.wikimedia.org/r/445380 (https://phabricator.wikimedia.org/T198201) (owner: 10Hashar) [10:18:44] hashar: is there a list of require-dev extensions that are loaded for wmf ci? [10:18:49] * addshore needs to add one for Wikibase [11:07:36] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:14:31] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10Pablo-WMDE) **Failing job** https://integration.wikimedia.org/ci/job/qu... [11:28:11] 10Phabricator (2017-06-01), 10RelEng-Archive-FY201718-Q1: Enable embedding of media from Wikimedia Commons - https://phabricator.wikimedia.org/T116515 (10mmodell) maybe we need to adjust content security policy? [13:34:00] 10Continuous-Integration-Config, 10Wikidata, 10User-Addshore: Install cache/integration-tests with Wikibase CI tests - https://phabricator.wikimedia.org/T199440 (10Addshore) [13:50:43] RECOVERY - Puppet errors on deployment-elastic09 is OK: OK: Less than 1.00% above the threshold [0.0] [14:38:04] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:14] 10Project-Admins, 10Pywikibot-core, 10Pywikibot-RfCs: Shall we rename Pywikibot-core Phabricator project to Pywikibot? - https://phabricator.wikimedia.org/T195893 (10Dvorapa) Also the PWBC should be changed maybe to PWB [14:42:53] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 47285 bytes in 1.697 second response time [15:16:40] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar) @Pablo-WMDE for #1 and #2 that is `User should be able to chan... [15:16:56] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar) [16:01:04] 10Release-Engineering-Team (Watching / External), 10GitHub-Mirrors, 10Security-Team: Enforce 2FA for GitHub members - https://phabricator.wikimedia.org/T198810 (10mmodell) [16:01:06] 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) 05Open>03Resolved a:03mmodell I've added 2factor to wmfphab on github [16:07:32] what is the status of wmf.12 going out to group2 wikis today? I thought it was scheduled but I don't see an actual deploy in sal [16:07:57] I'm asking because if it's there I can test something, otherwise i work on something else [16:08:25] It's done, I think [16:08:25] https://tools.wmflabs.org/versions/ [16:08:34] Yeah, it's everywhere [16:08:49] oh good! thanks [16:09:04] I searched around in sal but couldn't find any 'start deploy' that looked like it [16:09:20] 2018-07-12 13:14 rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.12 [16:32:24] 10Gerrit, 10Release-Engineering-Team: Update gerrit to 2.15.3 - https://phabricator.wikimedia.org/T199460 (10Paladox) [16:50:50] 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10dchan) [16:54:08] 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10Paladox) Did you enable "Fit to screen" in the preference? [17:19:04] 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10dchan) Thanks, that solves it for me - is there any way that can be the default (at least for Unified)? [17:19:53] 10Gerrit: No keyboard shortcut for adding reviewers in PolyGerrit - https://phabricator.wikimedia.org/T199463 (10Paladox) filled upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=9422 [17:20:27] 10Gerrit, 10Upstream: No keyboard shortcut for adding reviewers in PolyGerrit - https://phabricator.wikimedia.org/T199463 (10Paladox) [17:27:12] apergos: welcome to the new (sometimes) normal of early trains! :) [17:29:39] 10Gerrit: PolyGerrit unified diff view is too narrow - https://phabricator.wikimedia.org/T199464 (10Paladox) Filled that part upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=9423 [17:34:22] I looked at the schedule and had my eye open for the deploy [17:34:27] I just didn't see it go by :-) [17:34:45] this means tomorrow I can run some tests for someone [17:34:54] :) [17:34:55] 10Beta-Cluster-Infrastructure, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Disk usage on deployment-kafa-jumbo-* causing alerts - https://phabricator.wikimedia.org/T198262 (10Nuria) 05Open>03Resolved [17:34:58] if they pass i will be able to make a customer happy [17:46:39] (03PS9) 10Thcipriani: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall) [17:46:41] (03PS2) 10Thcipriani: Pipeline: helm deployment to CI namespace [integration/config] - 10https://gerrit.wikimedia.org/r/444647 [17:48:11] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review, 10User-notice: Make PolyGerrit the default ui - https://phabricator.wikimedia.org/T196812 (10Framawiki) >>! In T196812#4401708, @Johan wrote: > OK, I simplified a bit and geared it a bit more towards the main audience of Tech News (non-developers),... [18:07:06] (03PS3) 10Thcipriani: Pipeline: helm deployment to CI namespace [integration/config] - 10https://gerrit.wikimedia.org/r/444647 [18:15:22] 10MediaWiki-Releasing: Move to a `git archive` like model for MediaWiki tarball creation - https://phabricator.wikimedia.org/T199467 (10Legoktm) [18:25:57] 10Release-Engineering-Team (Next), 10Release Pipeline (Blubber): Blubber should support php/composer - https://phabricator.wikimedia.org/T186547 (10greg) [18:26:09] SMalyshev: you are very fast [18:26:30] however I did just earlier make sure the code was deployed to all wikis today [18:26:42] since it is the evening I will be looking at this tomorrow or mondy [18:26:51] (see the backread in this very channel!) [18:31:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next): zuul-merger /var/log/git-daemon/syslog.log is not log rotated - https://phabricator.wikimedia.org/T188107 (10greg) [18:37:18] apergos: thanks! [18:37:41] sure! [19:04:16] !log deployment-prep: deleting new instance deployment-maps04 (initial puppet run failed) and creating deployment-maps05 [19:04:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:08:55] PROBLEM - Host deployment-maps04 is DOWN: CRITICAL - Host Unreachable (10.68.18.184) [19:13:49] Hi greg-g [19:14:01] Are you the one with access to the gcal for deploy slots? [19:14:09] Because it needs updating ;) [19:56:38] TOO MANY CONCURRENT CONNECTIONS [19:56:39] You ("2a00:23c4:ad0a:7d01:2c7c:f6c9:25a8:19c0") have too many concurrent connections. [19:56:42] hmm i just got that [19:56:44] twentyafterfour ^^ [20:38:50] PROBLEM - SSH on integration-slave-docker-1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:48:40] RECOVERY - SSH on integration-slave-docker-1017 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [21:06:49] addshore: yeah, heh, will do. I'm afk for a while this afternoon. [21:13:51] 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10Krenair) @mmodell, doesn't {K13} need to be updated? [21:14:53] 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) The password hasn't changed. [21:18:26] 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10Krenair) But you added MFA, doesn't a token now need to be used when Phab pushes using that username to Github over HTTPS? [21:18:49] mdholloway, hi [21:22:02] Krenair: hello [21:22:14] mdholloway, I see there's a deployment-maps05 instance now [21:22:52] mdholloway, have you been able to log into it? [21:24:06] yes, and no, i haven't been able to log into it. i couldn't log into deployment-maps04, so ended up destroying it and starting afresh. same issue on -05 but i see what's going on now from the log. i emailed catrope about what to do about it. [21:24:42] (since he set up deployment-maps03) [21:26:10] did you try to add a class before you got logged on and puppet fixed? [21:27:45] no, the only change i had made to -04 was adding the 'maps' security group. i haven't touched -05. it is inheriting quite a bit of config, though: https://tools.wmflabs.org/openstack-browser/server/deployment-maps05.deployment-prep.eqiad.wmflabs [21:28:01] ah [21:28:35] sounds like it inherited a class but not the required config for it [21:29:57] i think that's right. it wants some credentials it's not finding, and i don't know how that was set up [21:30:31] oh it's possible this stuff only comes from the deployment-puppetmaster which it won't be able to access yet [21:33:00] yeah here we go [21:33:17] local cherry-picks on the deployment-puppetmaster providing that data [21:34:12] ah, i see [21:34:42] I'll just transfer the -maps prefix stuff to -maps03 for now to get this new one bootstrapped [21:41:49] we probably shouldn't use classes under prefixes due to our project puppetmaster :/ [21:44:41] PROBLEM - Host deployment-maps05 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:26] ^ I got rid of the bricked instance [21:46:26] bah I forgot the horizon puppet thing messes with types [21:47:23] PROBLEM - Puppet errors on deployment-maps03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:48:48] well now I really broke it :| [21:49:53] HTTP 500s from the hiera editing thing [21:50:36] uh oh [21:51:37] looks like it didn't like my: [21:51:39] profile::redis::master::clients: [21:51:39] - %{::fqdn} [21:54:22] due to lack of quotes around %{::fqdn} [21:57:54] okay so -maps03 puppet is successfully running with the classes moved from -maps prefix to the -maps03 instance, and the profile::redis::master parameters moved into hiera (due to type weirdness) [21:58:52] now I'm going to make deployment-maps04 (since the previous incarnation was inaccessible this is no big deal) and hopefully it will be accessible, I can get puppet working and then add the maps stuff [21:59:15] mdholloway, uh, what flavour was your one? [21:59:17] m1.large? [21:59:30] yes, that's right [21:59:39] thank you! [22:00:17] i have had some trouble in the past with destroying and re-creating instances with identical names, but it seems sporadic [22:02:16] mdholloway, I assume stretch rather than jessie? [22:03:12] yes, stretch [22:04:17] we'll see about the re-creating thing [22:04:24] should fail pretty loudly if anything in that area breaks [22:05:31] we have had problems in the past with instance's DNS entries not getting cleaned up properly on deletion (mostly the reverse PTRs but I think there may have been cases of the A records too) [22:07:07] alright LDAP isn't up on the new instance yet but I can get in as root [22:07:17] and there we go, that works too [22:07:24] RECOVERY - Puppet errors on deployment-maps03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:28] so deployment-maps04 is running [22:07:55] when running puppet of course it runs into problems with the new puppetmaster [22:09:07] great! i'm in, which is a step in the right direction. [22:09:44] so I'm running the usual stuff to get it working with our puppetmaster [22:10:28] 10Phabricator, 10Release-Engineering-Team: Enable 2FA on wmfphab github account - https://phabricator.wikimedia.org/T198823 (10mmodell) I created an access token and replaced the stored password with the access token. [22:11:15] the commands for which I dumped at the bottom of the description in https://phabricator.wikimedia.org/T195686 and probably documented somewhere hopefully [22:11:58] puppet has completed successfully on the new host, so [22:12:11] now we just copy the classes from -maps03 [22:15:23] it'll need the maps security group as well — not sure if it'll cause puppet trouble if the maps and cassandra classes without it, but it might [22:15:51] if the maps and cassandra classes *are applied* without it, that is [22:16:47] alright I've added that and copied the classes across [22:16:51] let's see what puppet thinks [22:17:06] oh I didn't copy profile::cassandra::single_instance::jmx_exporter_enabled: false [22:25:02] mdholloway, so [22:25:06] it did a ton of stuff [22:25:10] and there were some problems [22:25:13] Error: Could not find group maps-admins [22:25:13] Error: /Stage[main]/Profile::Maps::Postgresql_common/File[/var/log/postgresql/postgresql-9.6-main.log]/group: change from adm to maps-admins failed: Could not find group maps-admins [22:25:21] sounds like it expects the admin module to be in place [22:25:44] E: Version '2.2.6-wmf5' for 'cassandra' was not found [22:25:45] Notice: /Stage[main]/Cassandra/Package[cassandra-tools-wmf]: Dependency Package[cassandra] has failures: true [22:25:48] probably stretch vs. jessie? [22:27:59] PROBLEM - Puppet errors on deployment-maps04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:29:37] hmm, yeah, the cassandra package issue sounds like stretch vs. jessie [22:29:38] some more failures that are probably due to that one [22:29:51] greg-g: I've been meaning to ask you about an overlap between deployment windows: https://wikitech.wikimedia.org/wiki/Talk:Deployments [22:30:12] root@deployment-maps04:/var/lib/puppet# apt-cache policy cassandra [22:30:12] cassandra: [22:30:12] Installed: (none) [22:30:12] Candidate: 2.1.13 [22:30:13] Version table: [22:30:13] 2.1.13 1001 [22:30:15] 1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 Packages [22:30:48] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani) [22:30:50] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build base image for math extension pipeline tests - https://phabricator.wikimedia.org/T196939 (10thcipriani) 05Open>03declined Declining this for now, will revisit in future quarters [22:31:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani) 05Open>03declined Will revisit in future quarters [22:31:33] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Get MediaWiki running in Docker with Blubber - https://phabricator.wikimedia.org/T187105 (10thcipriani) [22:31:35] jessie has that but it also has 2.2.6-wmf1 and 2.2.6-wmf3 [22:31:38] presumably for a reason [22:31:44] so I think we should wait for ops to package it for stretch [22:32:35] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Build Math extension container on Postmerge - https://phabricator.wikimedia.org/T196414 (10thcipriani) [22:32:37] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Create integration/config pipeline job to create runnable MediaWiki + Math image - https://phabricator.wikimedia.org/T196944 (10thcipriani) 05Open>03declined Will revisit in future quarters [22:34:24] Krenair: here's the output on maps-test2004 (running stretch): [22:34:31] mholloway-shell@maps-test2004:~$ apt-cache policy cassandra [22:34:31] cassandra: [22:34:31] Installed: 2.2.6-wmf5 [22:34:31] Candidate: 2.2.6-wmf5 [22:34:31] Version table: [22:34:31] *** 2.2.6-wmf5 1001 [22:34:31] 1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/component/cassandra22 amd64 Packages [22:34:32] 100 /var/lib/dpkg/status [22:34:32] 2.1.13 1001 [22:34:33] 1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 Packages [22:34:45] interesting [22:37:13] modules/cassandra/manifests/init.pp has [22:37:21] '2.2' => os_version('debian >= stretch') ? { [22:37:21] true => hiera('cassandra::version', '2.2.6-wmf5'), [22:37:21] false => hiera('cassandra::version', '2.2.6-wmf3'), }, [22:38:13] I wonder what causes it to get the stretch-wikimedia/component/cassandra22 [22:38:30] oh theoretically the puppet thing below [22:38:31] so why hasn't it [22:38:46] root@deployment-maps04:/var/lib/puppet# cat /etc/apt/sources.list.d/wikimedia-cassandra22.list [22:38:46] deb http://apt.wikimedia.org/wikimedia stretch-wikimedia component/cassandra22 [22:39:09] oh there we go [22:39:14] just needed an apt-get update [22:39:21] Candidate: 2.2.6-wmf5 [22:39:26] ah, nice [22:39:33] so back to puppet [22:40:49] progress [22:40:51] not perfect yet [22:41:35] Error: Could not find group maps-admins [22:41:35] Error: /Stage[main]/Profile::Maps::Postgresql_common/File[/var/log/postgresql/postgresql-9.6-main.log]/group: change from adm to maps-admins failed: Could not find group maps-admins [22:41:40] so why does it work on -maps03 [22:41:47] has someone got an unpuppetised group on there? [22:42:39] yeah [22:42:45] krenair@deployment-maps03:~$ getent group maps-admins [22:42:45] maps-admins:x:1000: [22:42:49] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline: Helm test failing for CI namespace - https://phabricator.wikimedia.org/T199489 (10thcipriani) [22:42:53] root@deployment-maps04:/var/lib/puppet# getent group maps-admins [22:42:53] root@deployment-maps04:/var/lib/puppet# [22:42:55] * mdholloway isn't entirely surprised [22:42:59] think someone made it locally [22:43:19] in prod this would be an admin module dependency but we don't use that in labs [22:45:16] !log deployment-maps04 groupadd -g 1000 maps-admins [22:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:45:41] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline: Helm test failing for CI namespace - https://phabricator.wikimedia.org/T199489 (10thcipriani) [22:46:13] more progress [22:46:15] different problems [22:46:39] Notice: /Stage[main]/Postgresql::Server/Exec[pgreload]/returns: pg_ctl: directory "/srv/postgresql/9.6/main" is not a database cluster directory [22:46:39] Error: /Stage[main]/Postgresql::Server/Exec[pgreload]: Failed to call refresh: /usr/bin/pg_ctlcluster 9.6 main reload returned 1 instead of one of [0] [22:46:39] Error: /Stage[main]/Postgresql::Server/Exec[pgreload]: /usr/bin/pg_ctlcluster 9.6 main reload returned 1 instead of one of [0] [22:47:19] well it works now [22:48:05] mdholloway, take a look [22:48:47] looking... [22:57:08] Krenair: well, services are failing to start up for their own reasons, but i can ssh in, and puppet runs cleanly [22:57:29] thanks so much for your help! [22:57:59] RECOVERY - Puppet errors on deployment-maps04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:38] np [22:58:48] this stuff is a pain but I've been dealing with it for the past 3-4 years, so [23:55:59] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Set up puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792 (10bd808)