[00:41:22] 10Deployment-Systems, 3Scap3: Scap3 needs to be able to pull down deploy info - https://phabricator.wikimedia.org/T123013#1960847 (10thcipriani) 5Open>3Resolved [01:35:16] !log integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build [01:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:42:17] Yippee, build fixed! [02:42:18] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #792: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/792/ [02:44:18] Project beta-scap-eqiad build #87556: 04FAILURE in 0.45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87556/ [03:00:17] Yippee, build fixed! [03:00:18] Project beta-scap-eqiad build #87557: 09FIXED in 5 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87557/ [03:18:20] PROBLEM - Host deployment-mediawiki01 is DOWN: CRITICAL - Host Unreachable (10.68.17.170) [03:19:24] Project beta-scap-eqiad build #87559: 04FAILURE in 5 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87559/ [03:20:46] RECOVERY - Host deployment-mediawiki01 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [03:29:11] Yippee, build fixed! [03:29:12] Project beta-scap-eqiad build #87560: 09FIXED in 5 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87560/ [04:21:26] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:17] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39550 bytes in 0.512 second response time [04:42:26] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:57:17] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39533 bytes in 0.559 second response time [05:13:26] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:31:15] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39532 bytes in 0.456 second response time [09:52:21] (03PS1) 10Hashar: Add rake as a dependency [selenium] - 10https://gerrit.wikimedia.org/r/266205 [09:53:54] (03CR) 10Hashar: "I more or less have a sense at what is going on but my ruby knowledge is too low to properly understand the patch :-( I guess we can t" [selenium] - 10https://gerrit.wikimedia.org/r/265894 (https://phabricator.wikimedia.org/T105589) (owner: 10Dduvall) [09:55:15] (03CR) 10jenkins-bot: [V: 04-1] Add rake as a dependency [selenium] - 10https://gerrit.wikimedia.org/r/266205 (owner: 10Hashar) [09:55:21] ... [09:56:49] Notice: Undefined variable: date in /mnt/jenkins-workspace/workspace/mediawiki-selenium-integration/src/includes/WebResponse.php on line 144 [09:56:52] how helpful is that [10:16:34] (03CR) 10Hashar: "Test fails because of MediaWiki core change https://gerrit.wikimedia.org/r/#/c/265928/2/includes/WebResponse.php,cm which reference an uns" [selenium] - 10https://gerrit.wikimedia.org/r/266205 (owner: 10Hashar) [10:23:51] (03CR) 10Hashar: "Filled T124641" [selenium] - 10https://gerrit.wikimedia.org/r/266205 (owner: 10Hashar) [10:58:47] 10Deployment-Systems, 3Scap3, 7WorkType-NewFunctionality: default lock file for scap3 should be repo-dependent - https://phabricator.wikimedia.org/T116208#1961418 (10mmodell) [11:00:20] 10Deployment-Systems, 3Scap3, 7WorkType-NewFunctionality: Remove apache dependency from scap3 deployment host - https://phabricator.wikimedia.org/T116630#1961420 (10mmodell) I'm thinking we could start a SimpleHTTPServer rooted at /srv/deployment. @thcipriani: Is there any way to override the http port that... [12:37:25] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review, 7WorkType-NewFunctionality: Phase out operations-puppet-pep8 Jenkins job and tools/puppet_pep8.py - https://phabricator.wikimedia.org/T114887#1961592 (10scfc) There is a non-voting job `operations-puppet-tox-pep8-jessie` which is cons... [13:04:18] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Use PHPUnit 4.8 for PHP5.5 unit tests - https://phabricator.wikimedia.org/T124599#1961623 (10hashar) [13:16:25] 10Continuous-Integration-Infrastructure: database disk image is malformed in test mwext-qunit-composer - https://phabricator.wikimedia.org/T124611#1961632 (10Paladox) @Hashar this seems to still be a problem please see https://integration.wikimedia.org/ci/job/mwext-qunit-composer/346/console [13:16:45] 10Continuous-Integration-Infrastructure: database disk image is malformed in test mwext-qunit-composer - https://phabricator.wikimedia.org/T124611#1961634 (10Paladox) [13:25:03] hashar: there seems to be a fault in the qunit composer test [13:25:39] https://phabricator.wikimedia.org/T124611 [14:21:32] !log integration-slave-trusty-1015.integration.eqiad.wmflabs is gone. I have failed the kernel upgrade / grub update [14:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:22:43] PROBLEM - Host integration-slave-trusty-1015 is DOWN: CRITICAL - Host Unreachable (10.68.18.30) [14:36:23] RECOVERY - Host integration-slave-trusty-1015 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [14:39:36] PROBLEM - Puppet failure on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [14:40:28] PROBLEM - Puppet failure on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [14:50:37] RECOVERY - Puppet failure on integration-slave-trusty-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [14:54:35] RECOVERY - Puppet failure on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [14:58:56] Project beta-scap-eqiad build #87628: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87628/ [15:01:27] PROBLEM - Parsoid on deployment-parsoid05 is CRITICAL: Connection refused [15:05:38] PROBLEM - Puppet failure on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:06:28] PROBLEM - Puppet failure on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:06:32] RECOVERY - Parsoid on deployment-parsoid05 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.066 second response time [15:16:40] 10Continuous-Integration-Config: debian-glue fails with "hostname: Name or service not known" - https://phabricator.wikimedia.org/T124660#1961796 (10scfc) 3NEW [15:21:27] RECOVERY - Puppet failure on integration-slave-trusty-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [15:22:45] !log CI: fixing kernels not upgrading via: rm /boot/grub/menu.lst ; update-grub -y (i.e.: regenerate the Grub menu from scratch) [15:22:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:24:43] 10Continuous-Integration-Config: debian-glue fails with "hostname: Name or service not known" - https://phabricator.wikimedia.org/T124660#1961818 (10scfc) 5Open>3Invalid a:3scfc Parallel to this, there was an outage/glitch of the Labs DNS server, so I assume this was related to that. [15:35:56] hashar: It seems that the composer qunit test is failing. Please see https://phabricator.wikimedia.org/T124611 [15:47:20] Yippee, build fixed! [15:47:20] Project beta-scap-eqiad build #87631: 09FIXED in 32 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87631/ [15:54:48] 6Release-Engineering-Team, 15User-greg: Write up idea from talk with Quim (meta "don't forget" task) - https://phabricator.wikimedia.org/T124125#1961893 (10greg) 5Open>3Resolved [16:08:32] Looks like github replication for https://github.com/wikimedia/mediawiki is broken. Anyone know if there is already a ticket open for that? [16:08:46] ryasmeen|Away: around? [16:11:44] (03PS2) 10Niedzielski: Update Android test target [integration/config] - 10https://gerrit.wikimedia.org/r/264209 [16:12:07] (03CR) 10Niedzielski: "rebased" [integration/config] - 10https://gerrit.wikimedia.org/r/264209 (owner: 10Niedzielski) [16:12:11] bd808: there was some talk on Friday about github replication in this channel between ostriches and twentyafterfour can't find a ticket now though [16:12:57] kk. I guess I can make one and they can always close it as a dupe [16:13:56] (03PS2) 10Niedzielski: Preserve Android Spoon test output [integration/config] - 10https://gerrit.wikimedia.org/r/265009 (https://phabricator.wikimedia.org/T118100) [16:14:54] (03CR) 10Niedzielski: "rebased, deployed https://integration.wikimedia.org/ci/job/apps-android-wikipedia-periodic-test/310/console (there are unrelated failures " [integration/config] - 10https://gerrit.wikimedia.org/r/265009 (https://phabricator.wikimedia.org/T118100) (owner: 10Niedzielski) [16:16:06] (03CR) 10Hashar: [C: 032] Preserve Android Spoon test output [integration/config] - 10https://gerrit.wikimedia.org/r/265009 (https://phabricator.wikimedia.org/T118100) (owner: 10Niedzielski) [16:16:16] niedzielski: :-} [16:16:42] 6Release-Engineering-Team, 10Diffusion, 10Gerrit: mediawiki/core.git not replicating to github.com - https://phabricator.wikimedia.org/T124663#1961936 (10bd808) [16:20:07] Silly replication [16:20:33] :) [16:23:40] 6Release-Engineering-Team, 10Diffusion, 10Gerrit, 10GitHub-Mirrors: mediawiki/core.git not replicating to github.com - https://phabricator.wikimedia.org/T124663#1961967 (10greg) [16:32:19] 6Release-Engineering-Team, 10Diffusion, 10Gerrit, 10GitHub-Mirrors: mediawiki/core.git not replicating to github.com - https://phabricator.wikimedia.org/T124663#1962011 (10demon) Yeah, I tried to have Phabricator take over replication but it ain't working yet. I'll re-allow Gerrit to take over since Github... [16:33:52] 6Release-Engineering-Team, 10Diffusion, 10Gerrit, 10GitHub-Mirrors: mediawiki/core.git not replicating to github.com - https://phabricator.wikimedia.org/T124663#1962018 (10demon) 5Open>3Resolved a:3demon [16:34:03] meh [16:52:22] 7Browser-Tests, 10VisualEditor: VE cursor test invokes ULS IME in Firefox on test2wiki - https://phabricator.wikimedia.org/T57972#1962073 (10Jdforrester-WMF) [16:55:06] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:04:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39545 bytes in 0.669 second response time [17:51:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:56:26] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39540 bytes in 0.582 second response time [17:58:53] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962304 (10Quiddity) 3NEW [18:11:00] marxarelli: thcipriani ostriches twentyafterfour: redis in beta still seems fubar'd? ^^^ which is https://phabricator.wikimedia.org/T124677 [18:12:20] I saw that [18:22:16] 10Continuous-Integration-Infrastructure, 10Analytics, 5Patch-For-Review: Add json linting test for schemas in mediawiki/event-schemas - https://phabricator.wikimedia.org/T124319#1962462 (10Milimetric) p:5Triage>3Normal [18:44:13] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962566 (10mmodell) ``` twentyafterfour@deployment-redis01:~$ sudo /etc/init.d/redis-server start Starting redis-server: *** FATAL CONFIG FILE... [18:53:12] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962608 (10mmodell) ``` twentyafterfour@deployment-redis01:~$ redis-server --version Redis server v=2.8.4 sha=00000000:0 malloc=jemalloc-3.5.1... [18:53:50] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962613 (10mmodell) This was broken by {rOPUP1c16154c6ee76433722d0f01462db57727d37b64} [18:59:24] !log started redis-server on deployment-redis01 by commenting out latency-monitor-threshold from the redis.conf [18:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:59:54] greg-g: problem identified, concern raised, redis started [19:00:09] I'll leave the permanent fix to ori [19:00:26] since I have no idea what he intended exactly or why it didn't work out on beta [19:02:20] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962666 (10mmodell) a:3ori I commented the line from the config file and started redis, I'm going to leave it to ori to decide what to do abo... [19:13:29] twentyafterfour: thank you [19:14:54] twentyafterfour: btw, I still do appreciate your use of audit/comments post merge in diffusion :) [19:15:20] it's a really nice way to use phabricator, I'm just trying to lead by example ;) [19:15:22] thanks [19:16:33] it's really great when doing an analysis of a problem, I just run git blame, copy the commit hash into phabricator's search, then comment directly on the changes. [19:20:10] * greg-g nods [19:20:51] it's pretty sweet. I really can't wait until we're out of gerrit and people start using/seeing the awesomeness of intergrated pre/post merge code-review and tasks and all the beautiful things [19:23:27] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1962896 (10greg) @MGChecker: please try again, we just restarted Redis in Beta Cluster again, see: T124677#1962666 [19:28:16] 7Browser-Tests, 10VisualEditor: VE cursor test invokes ULS IME in Firefox on test2wiki - https://phabricator.wikimedia.org/T57972#1962921 (10Jdforrester-WMF) If anything, this problem may now be back as we've just (for wmf.12) re-enabled ULS for all users. [19:29:11] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1962923 (10greg) >>! In T124388#1957339, @hashar wrote: > Well puppet patch from Dec 29th introduced the 'latency-monitor-thresh... [19:29:27] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1962928 (10greg) [19:29:29] 10Beta-Cluster-Infrastructure: Error on betacluster resetting my Preferences - JobQueueConnectionError from JobQueueRedis.php - https://phabricator.wikimedia.org/T124677#1962929 (10greg) [19:30:36] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1962934 (10greg) [19:30:58] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1962304 (10greg) p:5Triage>3High Pri: Higher than normal, so High :) [20:05:22] (03PS1) 10Niedzielski: Use Android Spoon test results [integration/config] - 10https://gerrit.wikimedia.org/r/266302 [20:08:46] (03PS1) 10Paladox: [WPtouch] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/266306 [20:12:52] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1963190 (10MGChecker) @greg No, it's the same error as before. Furthermore, Special:RecentChanges doesn't get updated. [21:16:28] 10Beta-Cluster-Infrastructure, 10Wikidata: Wikibase langlink handling on beta severly broken - https://phabricator.wikimedia.org/T124715#1963519 (10hoo) 3NEW [21:18:20] 10Beta-Cluster-Infrastructure, 10Wikidata: [Task] Wikibase langlink handling on beta severely broken - https://phabricator.wikimedia.org/T124715#1963535 (10hoo) [21:28:29] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #154: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/154/ [21:48:29] hm, so deployment-pdf01 says, "The last Puppet run was at Fri Jan 22 21:55:09 UTC 2016 (4298 minutes ago)" and isn't responding to git-deploy. [21:48:34] anyone know how to poke it? [21:49:00] it appears to be running the OCG service alright. just not listening to salt? [21:52:47] cscott: I can try to get it to pull down a repo, which repo are you deploying? [21:53:11] thcipriani: https://wikitech.wikimedia.org/wiki/OCG#Deploying_the_latest_version_of_OCG [21:53:41] * thcipriani is looking [21:54:49] thcipriani: git-deploy says "last-return: 72 mins ago" which is well before i started to git-deploy, don't know what's up with that. [21:55:42] heh, locally on that box I'm getting "Function deploy-fetch is not available" Never seen that before [21:56:17] it's always fun when computers do things you didn't know they could do. [21:56:18] guessing it has to do with the puppet failure I'm seeing: Server hostname 'deployment-puppetmaster.eqiad.wmflabs' did not match server certificate [21:56:57] which is understandable since we changed all host names to [whatever].deployment-prep.eqiad.wmflabs [22:00:17] well, I'm getting a little further down the path: redis.exceptions.ResponseError: READONLY You can't write against a read only slave. so it looks like it's having trouble with the redis returner possibly [22:00:56] (guessing this is probably a configuration problem due to the fact that puppet hasn't run in forever) [22:01:04] did we change something on friday? because puppet says: [22:01:04] deployment-pdf01 is a Puppet client of deployment-puppetmaster.eqiad.wmflabs (puppetclient) [22:01:04] The last Puppet run was at Fri Jan 22 21:55:09 UTC 2016 (4298 minutes ago [22:01:24] so it did run relatively-recently. just before the weekend at least. [22:02:20] that's true. I know that there were some redis problems recently, but not on the saltmaster I don't think. Investigating. [22:14:08] cscott: puppet working now, still getting a readonly thing from redis. Got to dig into the salt pillars :\ [22:16:31] don't gaze too long in that direction [22:18:11] cscott: did your deploy to deployment-pdf02 work? [22:18:32] thcipriani: no, neither host was responding to git-deploy. [22:18:51] i was just trying to debug deployment-pdf01 first. but i bet whatever puppet magic you did for -pdf01 needs to be repeated for -pdf02. [22:19:23] kk, yeah, giving me the same message on pdf02. Makes it seem like the culprit may actually be the redis server on deployment-bastion acting weird. [22:21:54] well that would explain it: redis-cli config get slave-read-only certainly returns "yes" [22:22:34] cscott: this may be a bigger can of worms :\ could you phile a ticket? [22:25:16] sure, hang on. [22:27:50] thcipriani: https://phabricator.wikimedia.org/T124720 [22:28:05] thcipriani: please add the appropriate 'projects' tags and cc's. [22:28:11] cscott: will do, thanks! [22:30:30] 10Beta-Cluster-Infrastructure: git-deploy broken on beta cluster - https://phabricator.wikimedia.org/T124720#1963827 (10thcipriani) [22:33:21] 10Beta-Cluster-Infrastructure: git-deploy broken on beta cluster - https://phabricator.wikimedia.org/T124720#1963847 (10thcipriani) Seems that the redis instance on `deployment-bastion` has been turned into a readonly slave. This is the redis instance that trebuchet uses as a returner. Trying to run `deploy.fet... [22:35:01] thcipriani: subbu reminds me that "earlier in the morning, joe wanted us to give him a heads up before parsoid deploys in case trebuchet is broken (see mail to ops list)" [22:35:07] thcipriani: maybe related? [22:35:50] cscott: I think that was related to using mira as the new deploy host for production, but we should double-check... [22:44:47] well, it seems to think it's a redis slave of mira, so that seems wrong... [22:48:14] thcipriani: is role::deployment::server an actual used-in-production role? [22:48:18] Or some kind of leftover abandoned thing? [22:49:02] andrewbogott: last I knew, role::deployment::server was used in production...this may have changed more recently though [22:49:29] oh, you’re right, my mistake [22:49:42] it's definitely used in beta [22:51:35] I’m trying to figure out when WikitechPrivateLdapSettings.php was most recently puppetized and why it isn’t anymore :/ [22:51:48] Krenair: might instantly know the answer to that, I guess [22:52:02] (03CR) 10Hashar: "recheck" [selenium] - 10https://gerrit.wikimedia.org/r/266205 (owner: 10Hashar) [22:52:39] andrewbogott, it's not puppetised? [22:52:47] Krenair: seems not [22:52:51] andrewbogott, oh, the puppetisation for that was broken actually I think [22:52:58] yes! [22:53:03] So I have just learned. [22:53:06] Broken when/how? [22:53:07] because it sets stuff in /srv/mediawiki [22:53:08] Not always, I assume? [22:53:10] instead of /srv/mediawiki-staging [22:53:51] when syncing, it will get overwritten by the copy in the MW private repo, and then puppet will come along and restore it to the puppetised version [22:54:40] Jenkins is doing something weird here -- https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/47440/console [22:54:43] Krenair: it’s included by deployment::wikitech which is included by role::deployment::mediawiki [22:54:51] which, as far as I can tell, is NOT installed on silver [22:55:03] yes, it's not supposed to be done on silver [22:55:20] oh, right, because it’s pushed out during deployment [22:55:24] ok! [22:55:26] we distribute mediawiki files from the deployment server, i.e. mira [22:55:29] Now I understand! [22:55:41] you should probably decide whether it should be *committed* in the private repo, or added to .gitignore in there and come from puppet [22:55:48] Because that file is generated with realm switches. But the realm of the deployment server is different from the realm of labtestweb2001 [22:55:57] thcipriani, andrewbogott: i've got to go offline for a little while to pick up my son from daycare. i'll be back in ~30 minutes if you need help from the ocg side. [22:56:03] um, what [22:56:07] :) [22:56:15] Oh, I see [22:56:18] Unrelated to what you’re saying (which, I’m not really following what you’re saying at all) [22:56:25] yeah that's problematic [22:56:44] it's not an issue to put these passwords on all MW servers is it? [22:56:54] cscott: kk, I'm down the rabbithole now :P [22:57:05] thcipriani: thanks for your help btw [22:57:12] Krenair: I don’t think so. We’re doing that now, right? [22:59:04] yes [22:59:07] Krenair: is there a better way to get those passwords pushed out? Ideally one that doesn’t use the indirection of scap? [22:59:31] So we could theoretically push them with puppet and delete the file from the private repo [22:59:48] I think I don’t know what you mean when you say ‘private repo' [23:00:29] the normal private repo [23:00:34] it sits on the deployment server in /srv/mediawiki-staging/private and gets copied to /srv/mediawiki/private on app servers [23:00:39] on palladium you mean? [23:00:45] no, that's the puppet private thing [23:00:54] yes, that’s what I call ‘the private repo’ [23:00:56] hence my confision [23:00:58] confusion [23:01:04] don’t assume that I know how deployment works :) [23:01:33] so you are suggesting puppetizing it directly on silver [23:01:47] bd808: looks like that workspace is dirty (VisualEditor/lib/ve has local changes for some reason) [23:01:52] and skipping the puppet->staging->app server right? [23:02:03] marxarelli: cool. I guessed that was it [23:02:16] okay, I just checked on mira [23:02:39] bd808: i hard reset the submodule dir. let's try a recheck [23:03:06] WikitechPrivateLdapSettings.php is present in the working directory but is listed in .gitignore and is not committed [23:03:31] right, because it comes from puppet rather than from git. I think... [23:03:42] If it was deleted from the deployment server, scap would delete the file if puppet tried to create it under /srv/mediawiki on silver/labtestweb2001 [23:05:02] sure [23:05:06] I think I follow that [23:05:53] I feel like I originally wrote this to have the file someplace outside the purview of scap, and then just pull it in with a /complete/path [23:05:59] and that was vetoed for some reason [23:06:13] there's 3 private passwords in there, can't we jus- [23:06:22] oh, that was going to be my suggestion [23:06:29] hm [23:06:37] well, ok, I guess we should just do it and see who complains :) [23:06:41] thoughts about where is a good place? [23:07:07] Well. actually. The first password is public as it's part of labs/private. [23:07:22] yeah, it’s only one private password really [23:07:49] and the other two are the same thing configured for two different extensions [23:07:55] right [23:07:58] so it's really just one private password [23:08:58] So we could just have some file somewhere on silver/labtestweb2001 that MW can read [23:09:14] set up by puppet [23:09:27] bd808: looks fixed [23:10:02] Krenair: thoughts about where is a good place? [23:10:59] marxarelli: thanks! [23:11:41] bd808: yeah man. are you planning on merging those changes today or should we wait until tomorrow morning to rebase/merge? [23:12:01] andrewbogott, well... the ldap admin stuff keeps it's credentials in /etc/ldap [23:12:31] so /etc/ldap/WikitechPrivateLdapSettings.php ? [23:12:31] in a file called .ldapscriptrc [23:12:38] wouldn't need to be php [23:12:46] we have one single password to store [23:13:04] marxarelli: I was just going to queue them up today [23:13:10] I don’t want to hard-code the other password in config even though it’s public [23:13:21] because it should only be stored in one place (and changes according to realm, so should come from puppet) [23:13:37] bd808: alrighty. i can merge them in the morning before the train starts rolling [23:14:07] hmm [23:14:12] this is probably something I should ask ori about [23:14:20] Krenair: why not keep the file as it is, just move it elsewhere? [23:14:34] And included it just as it is now, but with a different path? (and with a conditional around the include) [23:16:41] I'm not sure I like the idea of executing PHP files from outside the deployment system [23:16:58] it'd certainly work [23:20:19] Project beta-update-databases-eqiad build #6046: 04FAILURE in 18 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/6046/ [23:21:40] andrewbogott, we can always change puppet to set up /srv/mediawiki-staging/private/WikitechPrivateLdapSettings.php on the deployment servers and check $wmfDatacenter == 'eqiad' [23:21:49] Krenair: will a stray file in /srv/mediawiki be overwritten or deleted? [23:21:51] or $wmfDatacenter == 'codfw' [23:22:00] yes, but by the one set by puppet [23:22:12] ? [23:22:21] puppet will be writing the one in mediawiki-staging [23:22:22] I mean, if puppet installs a file there… is it safe? [23:22:25] scap will deploy it from there [23:23:41] If puppet installs /srv/mediawiki/wikitechprivatesettings.php on silver [23:23:46] will it be wiped out by scap? [23:24:06] yes, but puppet won't be installing that, it'll be installing /srv/mediawiki-staging/private/WikitechPrivateLdapSettings.php on the deployment servers [23:24:25] I am discussion possible future behavior [23:24:28] I understand the current behavior [23:24:46] you're proposing the current behaviour, I am discussing possible future behaviour.. [23:24:53] ? [23:25:06] Current behavior is that puppet installs the files on the deployment host [23:25:13] I don’t want that, I just want the file installed by puppet onto wikitech [23:25:21] onto silver [23:25:25] or labtestweb2001 [23:25:29] yes, on the deployment host, *in /srv/mediawiki* [23:25:33] where it will get wiped out [23:25:39] ‘deployment host’? [23:25:41] it needs to be in /srv/mediawiki-staging to not get wiped [23:25:43] silver is a deployment host? [23:25:59] you're getting ahead of me, I'm responding to Current behavior is that puppet installs the files on the deployment host [23:26:16] ok [23:26:37] andrewbogott, we can always change puppet to set up /srv/mediawiki-staging/private/WikitechPrivateLdapSettings.php on the deployment servers and check $wmfDatacenter == 'eqiad' [23:26:56] Or, we can have puppet only provide the passwords to the wikitech servers, and have PHP read it in [23:26:58] how does that help? [23:27:26] oh I guess if we pretend like codfw is only ever running labtest things [23:27:29] It will no longer get wiped out by scap, we can still have different settings for eqiad (wikitech) vs. codfw (labtestwikitech) [23:27:33] but right now there are three realms, eqiad, codfw, labtest [23:28:19] um [23:28:23] krenair@labtestweb2001:~$ cat /etc/wikimedia-cluster [23:28:23] codfw [23:29:13] And yet there are three realms, for puppet [23:29:39] puppet realms are irrelevant if we did go with that option [23:30:01] so that would mean that the test passwords get pushed out to every host in codfw? [23:30:28] the test and non-test passwords would get pushed to every mw host [23:31:11] difference is we'd be adding the test passwords [23:31:46] ooooh [23:31:51] the correct password would be chosen by mediawiki looking at $wmfDatacenter, determined by /etc/wikimedia-cluster [23:32:00] I didn’t understand that you were suggesting that the contents of WikitechPrivateLdapSettings.php change [23:32:04] which puppet sets to ::site [23:32:06] everywhere [23:32:33] so that would work as long as we never put a production wikitech in codfw [23:32:52] I think so, yes [23:33:04] we don't have to go with this option [23:33:25] Having puppet just drop the passwords on silver is generally more appealing, it’s just that I don’t know how how to do that (apart from writing out a .php snippet) [23:34:01] we could have puppet install the files only on the wikitech hosts, but we'd have to figure out a) where to put the files and b) whether it's acceptable to start including PHP files from outside of the deployment system [23:34:26] if B is no, we can still go with a different format, like just having it read text files or whatever [23:34:45] writing a .php file to /etc/ldap to be executed by MW seems really weird [23:35:37] that was why I was trying to ask if I could just write it to /srv/mediawiki [23:35:40] but it sounds like not [23:35:51] on the other hand, if we did that, then puppet would be responsible for determining realm instead of MW. And we wouldn't be putting the passwords on all hosts. [23:36:31] bd808, I am correct in remembering that scap deletes files not present on the deployment host, right? [23:36:33] yeah, I definitely like that better [23:36:40] since doing it via indirection is super confusing [23:36:45] (to me at least) [23:37:00] Krenair: yeah. it does a full rsync including deletes [23:37:17] what we don't have is a "sync-rm" script for a single file [23:37:26] yeah, I've run into that problem before [23:37:27] bd808: if you were going to have puppet drop a supplemental .php config snippet onto silver, where would you do it? [23:37:32] solution was to sync-dir the directory the file was in [23:37:41] byt scap and sync-dir will delete files on the target that don't exist on the master [23:37:42] which deletes the file [23:37:47] *nod* [23:38:19] andrewbogott: hmm.. can it not be on all the hosts and only used from silver? [23:38:27] sure, whatever [23:38:29] the private dir is for that sort of thing [23:38:37] so that's the other option [23:38:38] don't we actually already do that [23:38:39] bd808: no, it has to be installed on to the actual wiki host by puppet [23:38:44] not installed by scap [23:38:46] ah [23:38:48] bd808, we try to but get it wrong and scap overwrites puppet [23:38:54] scap doesn’t know realm [23:39:08] MW can figure out the site at runtime, which is all we currently need [23:39:21] puppet installs the file to the deployment server's /srv/mediawiki, instead of mediawiki-staging [23:39:48] andrewbogott: technically you can put it anywhere that www-data can read it and the we just have to tell CommonSettings where to find it [23:40:06] bd808: I know, but 'if you were going to have puppet drop a supplemental .php config snippet onto silver, where would you do it?' [23:40:19] We know that we can technically do that [23:40:27] we could have puppet install the files only on the wikitech hosts, but we'd have to figure out a) where to put the files and b) whether it's acceptable to start including PHP files from outside of the deployment system [23:40:28] if B is no, we can still go with a different format, like just having it read text files or whatever [23:40:29] probably somewhere like /etc/mediawiki/foo.php [23:40:29] 31<Krenair>30 writing a .php file to /etc/ldap to be executed by MW seems really weird [23:41:08] we're going to create an /etc/mediawiki/ for this? [23:41:15] why not? [23:41:35] is /etc somehow sacred? [23:41:44] it's the dir for host local configuration [23:41:54] seems weird to make an /etc/mediawiki/ for what is essentially an edge case [23:42:09] but at least it's not /etc/ldap/, I suppose [23:42:33] I don't have a real objection, just seems a little odd [23:42:42] whatever [23:42:54] andrewbogott, if you think that's acceptable, go for it [23:43:12] call it /etc/wikimedia if that makes you happier or /etc/wikitech [23:43:44] yep, patch in progress [23:43:56] I need to go, though, will add both of you as reviewers when I have it ready [23:44:51] ok [23:52:37] 10Beta-Cluster-Infrastructure: git-deploy broken on beta cluster - https://phabricator.wikimedia.org/T124720#1964182 (10thcipriani) 5Open>3Resolved a:3thcipriani Looks like a new hiera variable `deployment_server` was introduced into the `deployment::redis` class that wasn't set in beta. If the `$::fqdn` d...