[03:16:19] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #985: 04FAILURE in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/985/ [08:32:56] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [08:32:56] PROBLEM - Puppet failure on deployment-eventlogging04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:34:22] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:34:32] Yippee, build fixed! [08:34:32] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #885: 09FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/885/ [08:35:15] RECOVERY - Puppet failure on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:35:45] PROBLEM - Puppet failure on integration-slave-trusty-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:36:27] PROBLEM - SSH on integration-make-wmf-branch is CRITICAL: Connection refused [08:36:27] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [08:43:37] good morning [08:52:26] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<42.86%) [08:55:50] RECOVERY - Puppet failure on integration-slave-trusty-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [09:58:32] yeah here [09:58:33] zeljkof: : -D [10:07:36] Yippee, build fixed! [10:07:37] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #821: 09FIXED in 29 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/821/ [10:07:38] Yippee, build fixed! [10:07:38] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #825: 09FIXED in 34 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/825/ [10:07:59] zeljkof: math is green ^^^^ kudos [10:08:24] hashar: ( •_•) ( •_•)>⌐■-■ (⌐■_■) [10:09:08] 7Browser-Tests, 10Math, 13Patch-For-Review: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2050696 (10zeljkofilipin) 5Open>3Resolved [10:09:10] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050697 (10zeljkofilipin) [10:10:27] 7Browser-Tests, 10Math, 13Patch-For-Review: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2050703 (10hashar) The single test is passing now: Project browsertests-... [10:26:02] !log deployment-prep upgrading elastic-search to 1.7.5 on deployment-elastic0[5-8] [10:26:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:51:02] RECOVERY - salt-minion processes on scandium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:56:31] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#1992037 (10ori) Redis ops/sec should go back to their previous level -- see T126700#2050735 [11:36:25] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:01] hi, is there a way to see why https://gerrit.wikimedia.org/r/#/c/271032/ has not been merged yet? [12:35:20] hashar: core patch (skins) updated https://gerrit.wikimedia.org/r/#/c/270929/ [12:45:11] hashar: and visual editor commit has +1 from James :) https://gerrit.wikimedia.org/r/#/c/270724/ [12:47:08] zeljkof: core is +2 great [12:47:23] zeljkof: VE I have honestly no clue [12:48:09] at least there is a timeout on https://gerrit.wikimedia.org/r/#/c/270724/9/modules/ve-mw/tests/browser/features/support/pages/visual_editor_page.rb,cm :D [12:48:19] zeljkof: you would want to poke dan about [12:48:21] it [12:48:32] hashar: will do [12:48:37] PROBLEM - Host deployment-mediawiki02 is DOWN: PING CRITICAL - Packet loss = 100% [12:48:49] RECOVERY - Host deployment-mediawiki02 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [12:51:38] Project beta-scap-eqiad build #90651: 04FAILURE in 6 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/90651/ [12:58:08] deployment-mediawiki01.deployment-prep.eqiad.wmflabs port 22: No route to host ;,,,,, [13:00:57] Yippee, build fixed! [13:00:58] Project beta-scap-eqiad build #90652: 09FIXED in 6 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/90652/ [13:02:33] "rebuild" is magic [13:12:50] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 5MW-1.27-release-notes, and 3 others: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050886 (10zeljkofilipin) [13:21:38] Yippee, build fixed! [13:21:38] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #779: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/779/ [13:22:03] Yippee, build fixed! [13:22:04] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #796: 09FIXED in 1 min 38 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/796/ [13:26:54] yeahhhhh [13:26:57] echo fixed zeljkof ! [13:27:36] hashar: yes, just fixed it [13:27:42] https://gerrit.wikimedia.org/r/#/c/271533/ [13:28:06] "notifications.feature It is failing because 'Selenium user' on beta has Flow enabled on it's user talk. It shouldn't." [13:28:11] so I have just disabled it [13:28:26] which in turn might well break Flow tests hehe [13:30:40] argh [13:30:46] but they are broken anyway :p [13:31:31] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 5MW-1.27-release-notes, and 3 others: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050898 (10zeljkofilipin) [13:31:45] we really need isolation :( [13:32:48] hashar: ok, now I was also kicked out from our team channel :) [13:47:52] zeljkof: I guess greg-g screwed up some access rule . I am sure he will get it fixed whenever he joins [14:02:13] 10Continuous-Integration-Infrastructure, 10Mathoid, 13Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2050952 (10hashar) The CI part itself seems to be done. I have no idea why `mediawiki/services/mathoid/deploy` fails tests though :( [14:02:32] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10hashar) [14:02:55] 10Continuous-Integration-Config, 10Graphoid, 6Services: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#2050955 (10hashar) [14:03:24] 10Continuous-Integration-Config, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10hashar) [14:31:34] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [14:36:01] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #425: 04FAILURE in 8 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/425/ [14:53:42] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #798: 04FAILURE in 1 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/798/ [14:54:34] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #780: 04FAILURE in 2 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/780/ [14:54:56] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2031747 (10JanZerebecki) .15 is T127565, this is linked from .14 on https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0February.C2.A023 . Should this be a duplicate of T125597 which is last we... [15:00:55] 3Scap3, 10scap, 10Analytics, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051034 (10Ottomata) [15:09:56] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051073 (10hashar) Hello! So my mail to wikitech-l was a bit too short I lacked time to expose when I am going to... [15:15:24] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051080 (10Ladsgroup) Oh @hashar: This task is about deploying ORES extension into prod not the ORES service itse... [15:20:43] 10Deployment-Systems, 6Release-Engineering-Team, 6Labs, 10Labs-Infrastructure: integration-make-wmf-branch instance stall on Failed to start LSB: NFS support files common to client and server. - https://phabricator.wikimedia.org/T127705#2051101 (10hashar) [15:24:08] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051120 (10hashar) Yup I did the comparison with the production tasks on purpose. If we want to setup ORES on bet... [15:30:17] 7Browser-Tests, 6Collaboration-Team-Backlog, 10Flow, 13Patch-For-Review: Fix or delete failing Flow browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94153#2051153 (10zeljkofilipin) a:3zeljkofilipin [15:32:58] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Disable scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94150#2051190 (10zeljkofilipin) [15:35:30] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Disable scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94150#2051193 (10zeljkofilipin) [15:39:04] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#2051213 (10zeljkofilipin) [15:39:16] 7Browser-Tests, 6Collaboration-Team-Backlog, 10Flow, 13Patch-For-Review: Disable Flow scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94153#2051214 (10zeljkofilipin) [15:40:17] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#1156443 (10zeljkofilipin) a:5phuedx>3ze... [15:40:55] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2051221 (10demon) [15:41:48] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2031747 (10demon) [15:41:50] 6Release-Engineering-Team: MW 1.27-wmf.15 blockers - https://phabricator.wikimedia.org/T127565#2051225 (10demon) [15:42:35] twentyafterfour: hey, can you /join our team channel? [15:48:53] goood rmonniiing [15:48:57] thcipriani: :D [15:48:59] yt? [15:49:05] milimetric: and I are trying to do the aqs scap deploy [15:49:09] getting public key problems [15:49:15] i've restarted keyholder agent and proxy on tin [15:49:24] but you're still seeing a refusal to sign? [15:49:57] what is the response when you try to do: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l [user] [remote-host] [15:50:16] 15:46:28 ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'fetch'] on aqs1001.eqiad.wmnet returned [255]: Permission denied (publickey). [15:50:20] sorry ok.. [15:50:36] Permission denied (publickey). [15:50:40] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service aqs1001.eqiad.wmnet [15:50:51] doesn't say anything about agent refusing to sign? [15:50:59] just that [15:51:01] Permission denied (publickey). [15:51:15] PROBLEM - Keyholder SSH agent on tin is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [15:51:16] must mean that the key is either actually getting rejected or is not in the ssh-agent [15:51:24] a minute or so ago in -operations [15:51:27] good morning folks :) [15:51:29] arm!? [15:51:38] hashar: good morning :) [15:52:05] you can see what keys are there with: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh-add -l [15:52:38] right... [15:52:41] if the key isn't listed, you'll want to do: keyholder arm && sudo service keyholder-proxy restart [15:52:42] i have to arm this thing after restarting it? [15:52:47] ja, and I have to get pws? [15:53:03] yeah [15:53:03] The agent has no identities. [15:53:12] yeah, when you restart keyholder agent it'll need to be rearmed, the proxy just reloads permissions [15:53:24] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#2051289 (10zeljkofilipin) @Jdlrobson: the... [15:53:31] gotta remember where the key pws are... [15:53:43] https://wikitech.wikimedia.org/wiki/Keyholder [15:53:44] thin i know [15:53:51] ^ ottomata that page has their locations [15:55:38] ok, armed. [15:55:40] trying again [15:55:47] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051295 (10Ladsgroup) OK, here is summary of my discussion with @hashar: * there is two products that are moving... [15:56:01] might have to restart the keyholder-proxy service, too [15:56:04] ja ok [15:56:21] ok, so previously, i should have just restarted the keyholder-proxy, not the agent? [15:56:29] after puppet set up the new key? [15:56:43] hmm still deneind. [15:56:49] 15:56:35 ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'fetch'] on aqs1001.eqiad.wmnet returned [255]: Agent admitted failure to sign using the key. [15:56:49] Permission denied (publickey). [15:56:49] nah, you'd have to restart the agent to get the new key in there [15:57:01] hmmm [15:57:08] hmm, not but, hmmm [15:57:14] there isn't a new private key in this case [15:57:20] it shoudl use the deploy-service one [15:59:22] so ja, hm, thcipriani this time it is failing to sign [15:59:44] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service aqs1001.eqiad.wmnet [15:59:44] Agent admitted failure to sign using the key. [15:59:44] Permission denied (publickey). [15:59:59] ah, so, it's something to do with the permissions in keyholder-auth.d/deploy-service.yaml [16:00:15] on tin? [16:00:28] yeah, it's the file that defines groups able to use which key [16:00:43] you might have to add yourself to the deploy-service group? [16:01:34] ah! [16:01:35] hm [16:01:45] hMMmM [16:10:55] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051363 (10hashar) @Ladsgroup @Halfak further clarified on IRC. There is an ORES service on labs reachable at h... [16:26:40] greg-g: sorry for the late notice / screw up :( [16:26:49] greg-g: will checkin later in the evening [16:26:56] hashar: it's all right :) [16:28:15] greg-g: both hashar and me are kicked out of -team channel [16:28:28] muther [16:28:32] kk [16:28:43] * zeljkof will be back in 30 minutes [16:28:58] I set it to the same mode lines (channel settings) as the -staff channel, hoping that'd be fine [16:29:05] * greg-g looks some more [16:30:31] hasharAway: zeljkof now try (when you're back), I set it back to the previous mlock we had [16:34:56] 3Scap3, 10scap, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051520 (10greg) [16:35:08] 10Beta-Cluster-Infrastructure, 3Scap3, 10scap, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2051522 (10greg) [16:35:10] 3Scap3, 3releng-201516-q2, 3releng-201516-q3, 10scap: [keyresult] Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926#2051524 (10greg) [16:38:53] PROBLEM - Free space - all mounts on deployment-fluorine is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine.diskspace._srv.byte_percentfree (<33.33%) [16:43:59] ottomata2: do you get the issue with keyholder solved? [16:44:08] thcipriani: sorry am in meetings now :/ [16:44:09] yes i think so [16:44:13] it is group membership [16:44:16] just made https://phabricator.wikimedia.org/T127720 [16:44:40] which, makes me think maybe we shouldn't use deploy-service group for aqs deploy, not sure. buuuut, dunno, beacuse we want the services folks to be able to deploy aqs too [16:44:42] i dunno [16:46:03] yeah, it's definitely a permission that runs both ways. If they use deploy-service to deploy services, adding you to the group would definitely give you that permissions :\ [16:46:19] (03CR) 10Ladsgroup: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/271339 (owner: 10Ladsgroup) [16:46:59] !log deployment-prep upgrading deployment-logstash2 to elasticsearch 1.7.5 [16:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:48:31] 3Scap3, 10scap, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051619 (10greg) I reopened this as it is a tracking task for, uh, deploying aqs with scap3 :) The one that it was merged with was not about that. [17:06:43] hmm, actually, thcipriani, we do have an aqs-admins group [17:06:47] maybe we should just reuse that? [17:07:23] ottomata2: that sounds reasonable. [17:07:39] hmmmmm [17:07:44] would there have to be a new deployment user then? [17:10:51] you could do it one of two ways: either allow that group access to the deploy-service key and use the deploy-service user remotely OR new deployment-user, new key, new group. [17:16:52] hmm,i like the former [17:17:50] thcipriani: how would we do that? access to read the key is likely granted via group-read, ja? [17:17:57] and we can't change the group owner ship of that key [17:20:33] ottomata2: so you'd add a yaml file inside /etc/keyholder-auth.d/[blah].yml that has the contents: : [key-public-fingerprint] [17:20:49] there's support for that in the keyholder::agent define, but I don't think anyone has used that yet. [17:21:22] ah interesting [17:21:35] right ok, because its access to the key in keyholder, not just the file somewhere [17:21:36] hmmm [17:22:21] ok thcipriani i'm going to try that after the ops meeting and lunch [17:22:41] okie doke, sounds good. [17:34:32] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:44] hasharAway: hey, tell me when you're around [17:39:54] please check #mediawiki messages :D [17:42:45] (03CR) 10Paladox: "@Krinkle and @Legoktm and @Hashar will this work on any test that uses composer with php 53 since the packages being upgraded drop php 5.3" [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) (owner: 10Paladox) [17:46:31] !log ssh integration-slave-trusty-1017.eqiad.wmflabs 'sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/.git/config.lock [17:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:59:18] greg-g: So… are we going to bin wmf.14 and cut and deploy wmf.15? [17:59:38] no [17:59:50] wmf.14 this week, wmf.15 (with two weeks of stuff) next week [18:00:17] reasoning: After a severe regression it makes sense to keep the first deploy after as small as reasonable, then catch up after we see no further regression [18:00:22] James_F: ^ [18:00:25] OK. [18:00:45] * greg-g comments on some tasks [18:01:03] I'm a bit worried about dead code being deployed after not being tested for a week, but eh. [18:02:45] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#2052051 (10greg) Regarding the deploy of wmf.14 (which was cut last week): our plan is to deploy wmf.14 this week, and cut/deploy wmf.15 next... [18:02:59] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#2052053 (10greg) a:5hashar>3demon [18:18:22] ostriches: Hi :) Do you strongly prefer doubling wmf.14 on the Roadmap? https://www.mediawiki.org/w/index.php?title=MediaWiki_1.27%2FRoadmap&type=revision&diff=2056926&oldid=2056634 [18:18:42] It could be confusing, because other issues are handled in the same row, even if they are deployed the next week :) [18:19:14] Not really, I just was making edits to reflect reality at the time. If there's a better way to format the same info {{goforit}} :) [18:19:39] ostriches: great, thanks for the info :) [18:22:15] ostriches: https://www.mediawiki.org/w/index.php?title=MediaWiki_1.27%2FRoadmap&type=revision&diff=2059058&oldid=2056926 [18:22:24] hope that's fine for you :) [18:22:46] lgtm. although I think we might hold for another week too. [18:24:11] hmm, if this really happens, I re-visit this page and maybe _then_ a new row is needed :P [18:25:16] :) [18:36:09] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052244 (10dduvall) [18:39:00] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052269 (10thcipriani) [18:48:07] thcipriani: does this look correct? [18:48:07] https://gerrit.wikimedia.org/r/#/c/272516/ [18:49:14] ottomata: that should work [18:49:51] RECOVERY - Puppet failure on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [18:49:53] is the aqs-admin group not already on the deployment servers? [18:50:01] no its not [18:50:49] gotcha, then yeah, I think that patch looks right. [18:52:23] thcipriani: ok, so after puppet runs, i just need to restart the proxy? [18:52:25] or the admin too? [18:52:28] sorry [18:52:28] agent [18:52:40] you should only need to restart the proxy in this case I believe [18:53:00] since the key isn't changing, just the access perms. [18:53:16] k [18:55:27] (03PS4) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [18:57:09] (03PS5) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [19:00:43] hey thcipriani, I'm getting this when trying to deploy with scap: [19:00:46] "Sorry, user milimetric is not allowed to execute '/bin/mkdir -p /srv/deployment/analytics/aqs/deploy' as milimetric on tin.eqiad.wmnet" [19:00:51] (I can paste more) [19:01:34] milimetric: hmm lemme check a couple of things there. [19:01:41] k, thx [19:02:24] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052377 (10mmodell) [19:02:59] I'm just looking at the output of: cd /srv/deployment/analytics/aqs/deploy && deploy-log on tin [19:04:19] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052244 (10mmodell) [19:04:38] thcipriani: i'm looking at deploy-log too but i don't see any new output [19:07:32] so, from what I can see, scap is run running: mkdir -p /srv/deployment/analytics/aqs/deploy as your user. I'm not sure why that would cause an error... [19:07:55] hm me neither [19:08:58] especially since I can run that fine :) [19:09:06] ah, blerg. I think it's because of sudo_check_call. So the actual command is: sudo -u [you] -n -- mkdir -p [dir] [19:09:15] ooh [19:09:28] uhh [19:09:30] ha [19:09:36] and you aren't allowed to run that as you...because reasons...? [19:09:48] thcipriani: what is it trying to mkdir -p anyway? [19:09:50] on tin? [19:09:58] why? [19:09:59] the directory that already exists :) [19:10:00] i mean [19:10:06] I can't elevate to myself to do something that's already done [19:10:14] haha, well, mkdir -p is forgiving [19:10:25] but, sudo is not :) [19:10:29] yep [19:10:31] but, why does scap want to mkdir -p that? [19:10:50] thcipriani: is that command running on tin or on the target? [19:11:13] ottomata: it's running on tin. It's part of the context object setup for some reason. [19:11:33] context object setup? I thought scap wasn't doing any bootstrapping (yet)? [19:11:41] https://github.com/wikimedia/scap/blob/master/scap/context.py#L41-L44 [19:12:12] it's just part of what scap does to get its lay of the land on tin: the root directory is here, the scap directory is here, etc. [19:12:40] this happens on both sides on the deploy host and on the target, which is why it tries to create the directory if it doesn't exist. [19:12:52] in this instance it's failing on the deployment host [19:13:32] it could just check if the current user == [you] [19:14:15] yeah, we have a ticket for that. Needs a priority bump evidently. [19:15:48] ottomata: I guess until that's fixed you're the one that has to do the deploy since you have sudo [19:16:11] and or check if the dir exists before trying to create it [19:16:31] ok milimetric going to ry [19:16:32] try [19:21:37] hmm, ok thcipriani still not working, but other reasons [19:21:43] ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'promote'] on aqs1001.eqiad.wmnet returned [70]: 19:19:32 INFO - Starting new HTTP connection (1): tin.eqiad.wmnet [19:21:55] and running that on aqs1001 manually: [19:21:55] ["deploy-local", "CalledProcessError", {"cmd": "ln -sf 'revs/ccfb3fd8feda1552e552c282614a0a124369443a' 'current'", "output": null, "returncode": 1}] [19:22:33] hmm, so deploy-local can't link that directory [19:23:27] i guess, but if I try to run that command as the deploy-service user [19:23:37] it creates a the symlink inside of current/ :p [19:23:53] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] 130 $ pwd [19:23:53] /srv/deployment/analytics/aqs/deploy-cache [19:23:53] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] $ sudo -u deploy-service ln -sfv revs/ccfb3fd8feda1552e552c282614a0a124369443a current [19:23:53] ‘current/ccfb3fd8feda1552e552c282614a0a124369443a’ -> ‘revs/ccfb3fd8feda1552e552c282614a0a124369443a’ [19:24:07] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] $ ls -l current/ccfb3fd8feda1552e552c282614a0a124369443a [19:24:07] lrwxrwxrwx 1 deploy-service deploy-service 45 Feb 22 19:23 current/ccfb3fd8feda1552e552c282614a0a124369443a -> revs/ccfb3fd8feda1552e552c282614a0a124369443a [19:24:08] :p [19:24:39] instead of overwriting the current symlink [19:25:41] so current is a directory and it's creating a link inside of it? [19:26:31] legoktm: Hi do you know why in integration/composer some of the files are submodules since it uses composer not submodules. Just wondering since when i remove vendor/composer and do a composer update and then git add -A --all it dosent add the files to git instead it says deleted and then says when i try to rm it again it is a submodule. [19:26:48] hashar: ^^ too please. [19:30:43] thcipriani: current is symlink to the last revision [19:30:51] and it is creating the symlink inside of that [19:31:04] its just putting the new symlink inside current/ instead of replacing it [19:31:04] but. [19:31:09] that is when i run ln -sf manually [19:31:13] NOT when deploy-local runs [19:31:18] deploy-local fails with retval 1 [19:31:25] and does not create any new symlink [19:31:47] so my manual ln -sf may not be a problem [19:31:48] (03PS6) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [19:32:00] perhaps if deploy-local worked properly, it would do the right thing with the current symlink [19:33:01] ottomata: can you paste the output of: deploy-local -v --repo 'analytics/aqs/deploy' -D 'log_json:False' on aqs1001? [19:35:03] paladox: integration/composer does not have submodules [19:35:22] hashar: Oh on my git it says vendor/composer/composer is. [19:35:36] thcipriani: do I run that as me or as deploy-service user? [19:35:43] as deplyo-service i thikn [19:35:46] paladox: but maybe when running composer install that happen to download git repositories somehow and maybe doing a git add would add them magically as submodules to the repo [19:35:52] ottomata: yeah, as deploy-service [19:36:09] hashar: Maybe yes. [19:36:37] thcipriani: https://gist.github.com/ [19:36:48] paladox: maybe composer install --prefer-dist [19:36:53] oops [19:36:55] https://gist.github.com/ottomata/aa37fa384cc4469afbb9 [19:37:02] * thcipriani looks [19:37:12] hashar: Ok, But if i do composer update would that make any difference [19:37:21] thcipriani: ottomata are you migrating aqs to use scap3 ? [19:37:29] paladox: no idea :D [19:37:35] hashar: Oh ok. [19:37:38] hashar: trying :\ [19:37:56] ottomata: it looks like it may be missing sudoer permissions for deploy-service? [19:38:22] hashar: Its strange when i run git rm vendor -r it removes all files except from vendor/composer [19:38:42] Well removes the files in that foler but keeps that folder. [19:38:54] paladox: guess because it is a git repo and git rm would consider it is outside of its scope [19:39:15] hashar: Oh do you know how to remove it please. [19:39:26] hashar: yes [19:39:32] paladox: man git-rm ? :D [19:39:32] hm [19:39:37] ottomata: those should be setup as part of scap::target which is called from service::deploy::scap3 [19:39:39] paladox: maybe pass it --force [19:39:42] hashar: Thanks [19:39:45] er service::deploy::scap [19:39:55] ottomata: thcipriani: awesome!! good luck in figuring out sudo rules [19:40:00] thcipriani: [19:40:00] [@aqs1001:/home/otto] $ sudo cat /etc/sudoers.d/scap_deploy-service [19:40:00] # This file is managed by Puppet! [19:40:01] deploy-service ALL=(deploy-service) NOPASSWD: ALL [19:40:01] deploy-service ALL=(root) NOPASSWD: /usr/sbin/service analytics/aqs/deploy * [19:40:47] ja, thcipriani that define is called via service::node [19:40:50] which aqs uses [19:41:05] service::node { 'aqs': [19:41:05] ... [19:41:05] deployment => 'scap3', [19:42:15] yarp. I don't understand why it's saying: 19:36:25 sudo: a password is required in the gist. It's doing: sudo -u deploy-service -n -- mkdir -p /srv/deployment/analytics/aqs/deploy-cache [19:42:41] (again running sudo as itself, but the that should be allowed by the sudoer rules) [19:43:34] j [19:43:34] a [19:44:00] thcipriani: when I run that exact command [19:44:08] no pw prompt needed [19:44:10] hm [19:44:26] are we sure the deploy-service user is the one sudoning? [19:44:28] i guess, yes, right? [19:44:31] who else could ssh from tin? [19:45:04] well, yes, because deploy-local does it [19:45:04] hmmm [19:47:06] marxarelli: could you think of any weirdness with utils.get_username() and ssh_user that would mean that the ssh_user wasn't preforming an action on the target? [19:47:19] thcipriani: when I look at deploy-log during deploy from tin [19:47:28] i get a different error, but for the same reason: pw prompt [19:47:36] what's the error there? [19:47:41] sudo: no tty present and no askpass program specified [19:47:43] ... [19:47:48] 19:46:50 [aqs1001.eqiad.wmnet] deploy-local failed: {u'cmd': u'sudo /usr/sbin/service aqs restart', u'output': None, u'returncode': 1} [19:47:48] 19:46:50 [tin] [u'/usr/bin/deploy-local', u'-v', u'--repo', u'analytics/aqs/deploy', u'-g', u'default', u'promote'] on aqs1001.eqiad.wmnet returned [70]: 19:46:50 INFO - Starting new HTTP connection (1): tin.eqiad.wmnet [19:47:49] pasting. [19:48:04] https://gist.github.com/ottomata/078e62d20edfe77da618 [19:48:09] * thcipriani looks [19:48:37] whatever it is, probably the same problem [19:49:02] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/270734/ please. [19:49:29] Its todo with changing BlueSpiceExtensions extension to use the npm template since ive added the npm tests to the extension [19:49:36] ottomata: it may not be the same problem. [19:49:38] thcipriani: i can't. it's just uses `os.getuid` [19:49:42] Im also chaning mw-checks to mw-checks-tests [19:50:07] (03PS2) 10Paladox: Migrate test mediawiki-vagrant-puppet-doc-publish to UbuntuTrusty [integration/config] - 10https://gerrit.wikimedia.org/r/270658 [19:50:08] ottomata: seems like the error in the deploy-log you pasted was running: sudo /usr/sbin/service aqs restart [19:50:13] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052522 (10mmodell) [19:50:22] (which should come after the linking of the deploy-cache/current directory [19:50:26] thcipriani: and then `pwd.getpwuid`. so, maybe if the latter returned multiple entries? [19:50:50] thcipriani: i added a log in sud_check_call [19:50:52] 19:50:40 [aqs1001.eqiad.wmnet] sudo_check_call sudoing as deploy-service [19:51:01] marxarelli: I wonder if running: sudo -u deploy-service -- deploy-local -v ...etc would screw it up. [19:51:11] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052534 (10mmodell) a:3mmodell [19:51:22] thcipriani: i just created https://phabricator.wikimedia.org/D134 [19:51:28] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052522 (10mmodell) p:5Triage>3Normal [19:51:57] thcipriani: to avoid that particular kind of self-referential sudo madness [19:52:33] can't say what the exact underlying problem is in this case though, assuming sudoers allows deploy-server to sudo as itself [19:52:49] (03PS4) 10Hashar: [BlueSpiceExtensions] Add npm test, Also add mw-check-test which replaced mw-checks [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:52:50] marxarelli: nice, probably would be nice to get that into a package as a bug-fix [19:52:56] thcipriani: hmmmm yes, and it should do that [19:53:01] with just sudo to root for restart [19:53:25] OH [19:53:25] deploy-service ALL=(root) NOPASSWD: /usr/sbin/service analytics/aqs/deploy * [19:53:27] that is not right [19:53:31] (03PS5) 10Hashar: [BlueSpiceExtensions] Add npm test [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:53:33] analytics/aqs/deploy is not a service name. [19:53:35] looking [19:53:46] (03CR) 10Hashar: [C: 032] "I have rephrased slightly the commit message" [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:53:48] paladox: landing it [19:54:29] there is other weirdness, but i thikn i have puppet patch to fix this... [19:54:33] (03CR) 10Paladox: "Thanks. And sorry I linked to the wrong place. I thought I linked to the right patch but I didn't, sorry." [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:54:41] (03Merged) 10jenkins-bot: [BlueSpiceExtensions] Add npm test [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:55:38] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/270712/ if you have time please, Its to do with cleaning the skins directory on extensions tests since it causes problems for other extensions since skins doint follow the same guidelines as extensions have to do. [19:55:50] paladox: deployed! can you check changes on BlueSpiceExtensions still work fine with npm ? :) [19:56:17] paladox: yeah I have seen your skins related changes. Havent had time to properly review/test that one though [19:56:17]