[03:16:19] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #985: 04FAILURE in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/985/ [08:32:56] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [08:32:56] PROBLEM - Puppet failure on deployment-eventlogging04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:34:22] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:34:32] Yippee, build fixed! [08:34:32] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #885: 09FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/885/ [08:35:15] RECOVERY - Puppet failure on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:35:45] PROBLEM - Puppet failure on integration-slave-trusty-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:36:27] PROBLEM - SSH on integration-make-wmf-branch is CRITICAL: Connection refused [08:36:27] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [08:43:37] good morning [08:52:26] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<42.86%) [08:55:50] RECOVERY - Puppet failure on integration-slave-trusty-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [09:58:32] yeah here [09:58:33] zeljkof: : -D [10:07:36] Yippee, build fixed! [10:07:37] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #821: 09FIXED in 29 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/821/ [10:07:38] Yippee, build fixed! [10:07:38] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #825: 09FIXED in 34 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/825/ [10:07:59] zeljkof: math is green ^^^^ kudos [10:08:24] hashar: ( •_•) ( •_•)>⌐■-■ (⌐■_■) [10:09:08] 7Browser-Tests, 10Math, 13Patch-For-Review: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2050696 (10zeljkofilipin) 5Open>3Resolved [10:09:10] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050697 (10zeljkofilipin) [10:10:27] 7Browser-Tests, 10Math, 13Patch-For-Review: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2050703 (10hashar) The single test is passing now: Project browsertests-... [10:26:02] !log deployment-prep upgrading elastic-search to 1.7.5 on deployment-elastic0[5-8] [10:26:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:51:02] RECOVERY - salt-minion processes on scandium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:56:31] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#1992037 (10ori) Redis ops/sec should go back to their previous level -- see T126700#2050735 [11:36:25] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:01] hi, is there a way to see why https://gerrit.wikimedia.org/r/#/c/271032/ has not been merged yet? [12:35:20] hashar: core patch (skins) updated https://gerrit.wikimedia.org/r/#/c/270929/ [12:45:11] hashar: and visual editor commit has +1 from James :) https://gerrit.wikimedia.org/r/#/c/270724/ [12:47:08] zeljkof: core is +2 great [12:47:23] zeljkof: VE I have honestly no clue [12:48:09] at least there is a timeout on https://gerrit.wikimedia.org/r/#/c/270724/9/modules/ve-mw/tests/browser/features/support/pages/visual_editor_page.rb,cm :D [12:48:19] zeljkof: you would want to poke dan about [12:48:21] it [12:48:32] hashar: will do [12:48:37] PROBLEM - Host deployment-mediawiki02 is DOWN: PING CRITICAL - Packet loss = 100% [12:48:49] RECOVERY - Host deployment-mediawiki02 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [12:51:38] Project beta-scap-eqiad build #90651: 04FAILURE in 6 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/90651/ [12:58:08] deployment-mediawiki01.deployment-prep.eqiad.wmflabs port 22: No route to host ;,,,,, [13:00:57] Yippee, build fixed! [13:00:58] Project beta-scap-eqiad build #90652: 09FIXED in 6 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/90652/ [13:02:33] "rebuild" is magic [13:12:50] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 5MW-1.27-release-notes, and 3 others: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050886 (10zeljkofilipin) [13:21:38] Yippee, build fixed! [13:21:38] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #779: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/779/ [13:22:03] Yippee, build fixed! [13:22:04] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #796: 09FIXED in 1 min 38 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/796/ [13:26:54] yeahhhhh [13:26:57] echo fixed zeljkof ! [13:27:36] hashar: yes, just fixed it [13:27:42] https://gerrit.wikimedia.org/r/#/c/271533/ [13:28:06] "notifications.feature It is failing because 'Selenium user' on beta has Flow enabled on it's user talk. It shouldn't." [13:28:11] so I have just disabled it [13:28:26] which in turn might well break Flow tests hehe [13:30:40] argh [13:30:46] but they are broken anyway :p [13:31:31] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 5MW-1.27-release-notes, and 3 others: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2050898 (10zeljkofilipin) [13:31:45] we really need isolation :( [13:32:48] hashar: ok, now I was also kicked out from our team channel :) [13:47:52] zeljkof: I guess greg-g screwed up some access rule . I am sure he will get it fixed whenever he joins [14:02:13] 10Continuous-Integration-Infrastructure, 10Mathoid, 13Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2050952 (10hashar) The CI part itself seems to be done. I have no idea why `mediawiki/services/mathoid/deploy` fails tests though :( [14:02:32] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10hashar) [14:02:55] 10Continuous-Integration-Config, 10Graphoid, 6Services: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#2050955 (10hashar) [14:03:24] 10Continuous-Integration-Config, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10hashar) [14:31:34] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [14:36:01] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #425: 04FAILURE in 8 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/425/ [14:53:42] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #798: 04FAILURE in 1 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/798/ [14:54:34] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #780: 04FAILURE in 2 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/780/ [14:54:56] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2031747 (10JanZerebecki) .15 is T127565, this is linked from .14 on https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0February.C2.A023 . Should this be a duplicate of T125597 which is last we... [15:00:55] 3Scap3, 10scap, 10Analytics, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051034 (10Ottomata) [15:09:56] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051073 (10hashar) Hello! So my mail to wikitech-l was a bit too short I lacked time to expose when I am going to... [15:15:24] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051080 (10Ladsgroup) Oh @hashar: This task is about deploying ORES extension into prod not the ORES service itse... [15:20:43] 10Deployment-Systems, 6Release-Engineering-Team, 6Labs, 10Labs-Infrastructure: integration-make-wmf-branch instance stall on Failed to start LSB: NFS support files common to client and server. - https://phabricator.wikimedia.org/T127705#2051101 (10hashar) [15:24:08] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051120 (10hashar) Yup I did the comparison with the production tasks on purpose. If we want to setup ORES on bet... [15:30:17] 7Browser-Tests, 6Collaboration-Team-Backlog, 10Flow, 13Patch-For-Review: Fix or delete failing Flow browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94153#2051153 (10zeljkofilipin) a:3zeljkofilipin [15:32:58] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Disable scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94150#2051190 (10zeljkofilipin) [15:35:30] 10Browser-Tests-Infrastructure, 5Release-Engineering-Epics, 7Epic, 7Tracking: Disable scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94150#2051193 (10zeljkofilipin) [15:39:04] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#2051213 (10zeljkofilipin) [15:39:16] 7Browser-Tests, 6Collaboration-Team-Backlog, 10Flow, 13Patch-For-Review: Disable Flow scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94153#2051214 (10zeljkofilipin) [15:40:17] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#1156443 (10zeljkofilipin) a:5phuedx>3ze... [15:40:55] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2051221 (10demon) [15:41:48] 6Release-Engineering-Team: MW 1.27.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T127086#2031747 (10demon) [15:41:50] 6Release-Engineering-Team: MW 1.27-wmf.15 blockers - https://phabricator.wikimedia.org/T127565#2051225 (10demon) [15:42:35] twentyafterfour: hey, can you /join our team channel? [15:48:53] goood rmonniiing [15:48:57] thcipriani: :D [15:48:59] yt? [15:49:05] milimetric: and I are trying to do the aqs scap deploy [15:49:09] getting public key problems [15:49:15] i've restarted keyholder agent and proxy on tin [15:49:24] but you're still seeing a refusal to sign? [15:49:57] what is the response when you try to do: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l [user] [remote-host] [15:50:16] 15:46:28 ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'fetch'] on aqs1001.eqiad.wmnet returned [255]: Permission denied (publickey). [15:50:20] sorry ok.. [15:50:36] Permission denied (publickey). [15:50:40] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service aqs1001.eqiad.wmnet [15:50:51] doesn't say anything about agent refusing to sign? [15:50:59] just that [15:51:01] Permission denied (publickey). [15:51:15] PROBLEM - Keyholder SSH agent on tin is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [15:51:16] must mean that the key is either actually getting rejected or is not in the ssh-agent [15:51:24] a minute or so ago in -operations [15:51:27] good morning folks :) [15:51:29] arm!? [15:51:38] hashar: good morning :) [15:52:05] you can see what keys are there with: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh-add -l [15:52:38] right... [15:52:41] if the key isn't listed, you'll want to do: keyholder arm && sudo service keyholder-proxy restart [15:52:42] i have to arm this thing after restarting it? [15:52:47] ja, and I have to get pws? [15:53:03] yeah [15:53:03] The agent has no identities. [15:53:12] yeah, when you restart keyholder agent it'll need to be rearmed, the proxy just reloads permissions [15:53:24] 10Browser-Tests-Infrastructure, 10Reading-Web, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13), 5WMF-deploy-2016-02-23_(1.27.0-wmf.15): Disable MobileFrontend scenarios that fail at en.wikipedia.beta.wmflabs.org from running daily - https://phabricator.wikimedia.org/T94156#2051289 (10zeljkofilipin) @Jdlrobson: the... [15:53:31] gotta remember where the key pws are... [15:53:43] https://wikitech.wikimedia.org/wiki/Keyholder [15:53:44] thin i know [15:53:51] ^ ottomata that page has their locations [15:55:38] ok, armed. [15:55:40] trying again [15:55:47] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051295 (10Ladsgroup) OK, here is summary of my discussion with @hashar: * there is two products that are moving... [15:56:01] might have to restart the keyholder-proxy service, too [15:56:04] ja ok [15:56:21] ok, so previously, i should have just restarted the keyholder-proxy, not the agent? [15:56:29] after puppet set up the new key? [15:56:43] hmm still deneind. [15:56:49] 15:56:35 ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'fetch'] on aqs1001.eqiad.wmnet returned [255]: Agent admitted failure to sign using the key. [15:56:49] Permission denied (publickey). [15:56:49] nah, you'd have to restart the agent to get the new key in there [15:57:01] hmmm [15:57:08] hmm, not but, hmmm [15:57:14] there isn't a new private key in this case [15:57:20] it shoudl use the deploy-service one [15:59:22] so ja, hm, thcipriani this time it is failing to sign [15:59:44] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service aqs1001.eqiad.wmnet [15:59:44] Agent admitted failure to sign using the key. [15:59:44] Permission denied (publickey). [15:59:59] ah, so, it's something to do with the permissions in keyholder-auth.d/deploy-service.yaml [16:00:15] on tin? [16:00:28] yeah, it's the file that defines groups able to use which key [16:00:43] you might have to add yourself to the deploy-service group? [16:01:34] ah! [16:01:35] hm [16:01:45] hMMmM [16:10:55] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 6Revision-Scoring-As-A-Service, 13Patch-For-Review: Deploy ORES extension to beta cluster - https://phabricator.wikimedia.org/T127661#2051363 (10hashar) @Ladsgroup @Halfak further clarified on IRC. There is an ORES service on labs reachable at h... [16:26:40] greg-g: sorry for the late notice / screw up :( [16:26:49] greg-g: will checkin later in the evening [16:26:56] hashar: it's all right :) [16:28:15] greg-g: both hashar and me are kicked out of -team channel [16:28:28] muther [16:28:32] kk [16:28:43] * zeljkof will be back in 30 minutes [16:28:58] I set it to the same mode lines (channel settings) as the -staff channel, hoping that'd be fine [16:29:05] * greg-g looks some more [16:30:31] hasharAway: zeljkof now try (when you're back), I set it back to the previous mlock we had [16:34:56] 3Scap3, 10scap, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051520 (10greg) [16:35:08] 10Beta-Cluster-Infrastructure, 3Scap3, 10scap, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2051522 (10greg) [16:35:10] 3Scap3, 3releng-201516-q2, 3releng-201516-q3, 10scap: [keyresult] Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926#2051524 (10greg) [16:38:53] PROBLEM - Free space - all mounts on deployment-fluorine is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine.diskspace._srv.byte_percentfree (<33.33%) [16:43:59] ottomata2: do you get the issue with keyholder solved? [16:44:08] thcipriani: sorry am in meetings now :/ [16:44:09] yes i think so [16:44:13] it is group membership [16:44:16] just made https://phabricator.wikimedia.org/T127720 [16:44:40] which, makes me think maybe we shouldn't use deploy-service group for aqs deploy, not sure. buuuut, dunno, beacuse we want the services folks to be able to deploy aqs too [16:44:42] i dunno [16:46:03] yeah, it's definitely a permission that runs both ways. If they use deploy-service to deploy services, adding you to the group would definitely give you that permissions :\ [16:46:19] (03CR) 10Ladsgroup: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/271339 (owner: 10Ladsgroup) [16:46:59] !log deployment-prep upgrading deployment-logstash2 to elasticsearch 1.7.5 [16:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:48:31] 3Scap3, 10scap, 6Operations, 6Services: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#2051619 (10greg) I reopened this as it is a tracking task for, uh, deploying aqs with scap3 :) The one that it was merged with was not about that. [17:06:43] hmm, actually, thcipriani, we do have an aqs-admins group [17:06:47] maybe we should just reuse that? [17:07:23] ottomata2: that sounds reasonable. [17:07:39] hmmmmm [17:07:44] would there have to be a new deployment user then? [17:10:51] you could do it one of two ways: either allow that group access to the deploy-service key and use the deploy-service user remotely OR new deployment-user, new key, new group. [17:16:52] hmm,i like the former [17:17:50] thcipriani: how would we do that? access to read the key is likely granted via group-read, ja? [17:17:57] and we can't change the group owner ship of that key [17:20:33] ottomata2: so you'd add a yaml file inside /etc/keyholder-auth.d/[blah].yml that has the contents: : [key-public-fingerprint] [17:20:49] there's support for that in the keyholder::agent define, but I don't think anyone has used that yet. [17:21:22] ah interesting [17:21:35] right ok, because its access to the key in keyholder, not just the file somewhere [17:21:36] hmmm [17:22:21] ok thcipriani i'm going to try that after the ops meeting and lunch [17:22:41] okie doke, sounds good. [17:34:32] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:44] hasharAway: hey, tell me when you're around [17:39:54] please check #mediawiki messages :D [17:42:45] (03CR) 10Paladox: "@Krinkle and @Legoktm and @Hashar will this work on any test that uses composer with php 53 since the packages being upgraded drop php 5.3" [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) (owner: 10Paladox) [17:46:31] !log ssh integration-slave-trusty-1017.eqiad.wmflabs 'sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/.git/config.lock [17:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:59:18] greg-g: So… are we going to bin wmf.14 and cut and deploy wmf.15? [17:59:38] no [17:59:50] wmf.14 this week, wmf.15 (with two weeks of stuff) next week [18:00:17] reasoning: After a severe regression it makes sense to keep the first deploy after as small as reasonable, then catch up after we see no further regression [18:00:22] James_F: ^ [18:00:25] OK. [18:00:45] * greg-g comments on some tasks [18:01:03] I'm a bit worried about dead code being deployed after not being tested for a week, but eh. [18:02:45] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#2052051 (10greg) Regarding the deploy of wmf.14 (which was cut last week): our plan is to deploy wmf.14 this week, and cut/deploy wmf.15 next... [18:02:59] 6Release-Engineering-Team, 13Patch-For-Review, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#2052053 (10greg) a:5hashar>3demon [18:18:22] ostriches: Hi :) Do you strongly prefer doubling wmf.14 on the Roadmap? https://www.mediawiki.org/w/index.php?title=MediaWiki_1.27%2FRoadmap&type=revision&diff=2056926&oldid=2056634 [18:18:42] It could be confusing, because other issues are handled in the same row, even if they are deployed the next week :) [18:19:14] Not really, I just was making edits to reflect reality at the time. If there's a better way to format the same info {{goforit}} :) [18:19:39] ostriches: great, thanks for the info :) [18:22:15] ostriches: https://www.mediawiki.org/w/index.php?title=MediaWiki_1.27%2FRoadmap&type=revision&diff=2059058&oldid=2056926 [18:22:24] hope that's fine for you :) [18:22:46] lgtm. although I think we might hold for another week too. [18:24:11] hmm, if this really happens, I re-visit this page and maybe _then_ a new row is needed :P [18:25:16] :) [18:36:09] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052244 (10dduvall) [18:39:00] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052269 (10thcipriani) [18:48:07] thcipriani: does this look correct? [18:48:07] https://gerrit.wikimedia.org/r/#/c/272516/ [18:49:14] ottomata: that should work [18:49:51] RECOVERY - Puppet failure on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [18:49:53] is the aqs-admin group not already on the deployment servers? [18:50:01] no its not [18:50:49] gotcha, then yeah, I think that patch looks right. [18:52:23] thcipriani: ok, so after puppet runs, i just need to restart the proxy? [18:52:25] or the admin too? [18:52:28] sorry [18:52:28] agent [18:52:40] you should only need to restart the proxy in this case I believe [18:53:00] since the key isn't changing, just the access perms. [18:53:16] k [18:55:27] (03PS4) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [18:57:09] (03PS5) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [19:00:43] hey thcipriani, I'm getting this when trying to deploy with scap: [19:00:46] "Sorry, user milimetric is not allowed to execute '/bin/mkdir -p /srv/deployment/analytics/aqs/deploy' as milimetric on tin.eqiad.wmnet" [19:00:51] (I can paste more) [19:01:34] milimetric: hmm lemme check a couple of things there. [19:01:41] k, thx [19:02:24] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052377 (10mmodell) [19:02:59] I'm just looking at the output of: cd /srv/deployment/analytics/aqs/deploy && deploy-log on tin [19:04:19] 3Scap3, 10scap, 7WorkType-NewFunctionality: [Spike] Benchmark built-in HTTP server options for scap3 fanout - https://phabricator.wikimedia.org/T127733#2052244 (10mmodell) [19:04:38] thcipriani: i'm looking at deploy-log too but i don't see any new output [19:07:32] so, from what I can see, scap is run running: mkdir -p /srv/deployment/analytics/aqs/deploy as your user. I'm not sure why that would cause an error... [19:07:55] hm me neither [19:08:58] especially since I can run that fine :) [19:09:06] ah, blerg. I think it's because of sudo_check_call. So the actual command is: sudo -u [you] -n -- mkdir -p [dir] [19:09:15] ooh [19:09:28] uhh [19:09:30] ha [19:09:36] and you aren't allowed to run that as you...because reasons...? [19:09:48] thcipriani: what is it trying to mkdir -p anyway? [19:09:50] on tin? [19:09:58] why? [19:09:59] the directory that already exists :) [19:10:00] i mean [19:10:06] I can't elevate to myself to do something that's already done [19:10:14] haha, well, mkdir -p is forgiving [19:10:25] but, sudo is not :) [19:10:29] yep [19:10:31] but, why does scap want to mkdir -p that? [19:10:50] thcipriani: is that command running on tin or on the target? [19:11:13] ottomata: it's running on tin. It's part of the context object setup for some reason. [19:11:33] context object setup? I thought scap wasn't doing any bootstrapping (yet)? [19:11:41] https://github.com/wikimedia/scap/blob/master/scap/context.py#L41-L44 [19:12:12] it's just part of what scap does to get its lay of the land on tin: the root directory is here, the scap directory is here, etc. [19:12:40] this happens on both sides on the deploy host and on the target, which is why it tries to create the directory if it doesn't exist. [19:12:52] in this instance it's failing on the deployment host [19:13:32] it could just check if the current user == [you] [19:14:15] yeah, we have a ticket for that. Needs a priority bump evidently. [19:15:48] ottomata: I guess until that's fixed you're the one that has to do the deploy since you have sudo [19:16:11] and or check if the dir exists before trying to create it [19:16:31] ok milimetric going to ry [19:16:32] try [19:21:37] hmm, ok thcipriani still not working, but other reasons [19:21:43] ['/usr/bin/deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'promote'] on aqs1001.eqiad.wmnet returned [70]: 19:19:32 INFO - Starting new HTTP connection (1): tin.eqiad.wmnet [19:21:55] and running that on aqs1001 manually: [19:21:55] ["deploy-local", "CalledProcessError", {"cmd": "ln -sf 'revs/ccfb3fd8feda1552e552c282614a0a124369443a' 'current'", "output": null, "returncode": 1}] [19:22:33] hmm, so deploy-local can't link that directory [19:23:27] i guess, but if I try to run that command as the deploy-service user [19:23:37] it creates a the symlink inside of current/ :p [19:23:53] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] 130 $ pwd [19:23:53] /srv/deployment/analytics/aqs/deploy-cache [19:23:53] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] $ sudo -u deploy-service ln -sfv revs/ccfb3fd8feda1552e552c282614a0a124369443a current [19:23:53] ‘current/ccfb3fd8feda1552e552c282614a0a124369443a’ -> ‘revs/ccfb3fd8feda1552e552c282614a0a124369443a’ [19:24:07] [@aqs1001:/srv/deployment/analytics/aqs/deploy-cache] $ ls -l current/ccfb3fd8feda1552e552c282614a0a124369443a [19:24:07] lrwxrwxrwx 1 deploy-service deploy-service 45 Feb 22 19:23 current/ccfb3fd8feda1552e552c282614a0a124369443a -> revs/ccfb3fd8feda1552e552c282614a0a124369443a [19:24:08] :p [19:24:39] instead of overwriting the current symlink [19:25:41] so current is a directory and it's creating a link inside of it? [19:26:31] legoktm: Hi do you know why in integration/composer some of the files are submodules since it uses composer not submodules. Just wondering since when i remove vendor/composer and do a composer update and then git add -A --all it dosent add the files to git instead it says deleted and then says when i try to rm it again it is a submodule. [19:26:48] hashar: ^^ too please. [19:30:43] thcipriani: current is symlink to the last revision [19:30:51] and it is creating the symlink inside of that [19:31:04] its just putting the new symlink inside current/ instead of replacing it [19:31:04] but. [19:31:09] that is when i run ln -sf manually [19:31:13] NOT when deploy-local runs [19:31:18] deploy-local fails with retval 1 [19:31:25] and does not create any new symlink [19:31:47] so my manual ln -sf may not be a problem [19:31:48] (03PS6) 10Paladox: Update composer to dev-master [integration/composer] - 10https://gerrit.wikimedia.org/r/270548 (https://phabricator.wikimedia.org/T125343) [19:32:00] perhaps if deploy-local worked properly, it would do the right thing with the current symlink [19:33:01] ottomata: can you paste the output of: deploy-local -v --repo 'analytics/aqs/deploy' -D 'log_json:False' on aqs1001? [19:35:03] paladox: integration/composer does not have submodules [19:35:22] hashar: Oh on my git it says vendor/composer/composer is. [19:35:36] thcipriani: do I run that as me or as deploy-service user? [19:35:43] as deplyo-service i thikn [19:35:46] paladox: but maybe when running composer install that happen to download git repositories somehow and maybe doing a git add would add them magically as submodules to the repo [19:35:52] ottomata: yeah, as deploy-service [19:36:09] hashar: Maybe yes. [19:36:37] thcipriani: https://gist.github.com/ [19:36:48] paladox: maybe composer install --prefer-dist [19:36:53] oops [19:36:55] https://gist.github.com/ottomata/aa37fa384cc4469afbb9 [19:37:02] * thcipriani looks [19:37:12] hashar: Ok, But if i do composer update would that make any difference [19:37:21] thcipriani: ottomata are you migrating aqs to use scap3 ? [19:37:29] paladox: no idea :D [19:37:35] hashar: Oh ok. [19:37:38] hashar: trying :\ [19:37:56] ottomata: it looks like it may be missing sudoer permissions for deploy-service? [19:38:22] hashar: Its strange when i run git rm vendor -r it removes all files except from vendor/composer [19:38:42] Well removes the files in that foler but keeps that folder. [19:38:54] paladox: guess because it is a git repo and git rm would consider it is outside of its scope [19:39:15] hashar: Oh do you know how to remove it please. [19:39:26] hashar: yes [19:39:32] paladox: man git-rm ? :D [19:39:32] hm [19:39:37] ottomata: those should be setup as part of scap::target which is called from service::deploy::scap3 [19:39:39] paladox: maybe pass it --force [19:39:42] hashar: Thanks [19:39:45] er service::deploy::scap [19:39:55] ottomata: thcipriani: awesome!! good luck in figuring out sudo rules [19:40:00] thcipriani: [19:40:00] [@aqs1001:/home/otto] $ sudo cat /etc/sudoers.d/scap_deploy-service [19:40:00] # This file is managed by Puppet! [19:40:01] deploy-service ALL=(deploy-service) NOPASSWD: ALL [19:40:01] deploy-service ALL=(root) NOPASSWD: /usr/sbin/service analytics/aqs/deploy * [19:40:47] ja, thcipriani that define is called via service::node [19:40:50] which aqs uses [19:41:05] service::node { 'aqs': [19:41:05] ... [19:41:05] deployment => 'scap3', [19:42:15] yarp. I don't understand why it's saying: 19:36:25 sudo: a password is required in the gist. It's doing: sudo -u deploy-service -n -- mkdir -p /srv/deployment/analytics/aqs/deploy-cache [19:42:41] (again running sudo as itself, but the that should be allowed by the sudoer rules) [19:43:34] j [19:43:34] a [19:44:00] thcipriani: when I run that exact command [19:44:08] no pw prompt needed [19:44:10] hm [19:44:26] are we sure the deploy-service user is the one sudoning? [19:44:28] i guess, yes, right? [19:44:31] who else could ssh from tin? [19:45:04] well, yes, because deploy-local does it [19:45:04] hmmm [19:47:06] marxarelli: could you think of any weirdness with utils.get_username() and ssh_user that would mean that the ssh_user wasn't preforming an action on the target? [19:47:19] thcipriani: when I look at deploy-log during deploy from tin [19:47:28] i get a different error, but for the same reason: pw prompt [19:47:36] what's the error there? [19:47:41] sudo: no tty present and no askpass program specified [19:47:43] ... [19:47:48] 19:46:50 [aqs1001.eqiad.wmnet] deploy-local failed: {u'cmd': u'sudo /usr/sbin/service aqs restart', u'output': None, u'returncode': 1} [19:47:48] 19:46:50 [tin] [u'/usr/bin/deploy-local', u'-v', u'--repo', u'analytics/aqs/deploy', u'-g', u'default', u'promote'] on aqs1001.eqiad.wmnet returned [70]: 19:46:50 INFO - Starting new HTTP connection (1): tin.eqiad.wmnet [19:47:49] pasting. [19:48:04] https://gist.github.com/ottomata/078e62d20edfe77da618 [19:48:09] * thcipriani looks [19:48:37] whatever it is, probably the same problem [19:49:02] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/270734/ please. [19:49:29] Its todo with changing BlueSpiceExtensions extension to use the npm template since ive added the npm tests to the extension [19:49:36] ottomata: it may not be the same problem. [19:49:38] thcipriani: i can't. it's just uses `os.getuid` [19:49:42] Im also chaning mw-checks to mw-checks-tests [19:50:07] (03PS2) 10Paladox: Migrate test mediawiki-vagrant-puppet-doc-publish to UbuntuTrusty [integration/config] - 10https://gerrit.wikimedia.org/r/270658 [19:50:08] ottomata: seems like the error in the deploy-log you pasted was running: sudo /usr/sbin/service aqs restart [19:50:13] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052522 (10mmodell) [19:50:22] (which should come after the linking of the deploy-cache/current directory [19:50:26] thcipriani: and then `pwd.getpwuid`. so, maybe if the latter returned multiple entries? [19:50:50] thcipriani: i added a log in sud_check_call [19:50:52] 19:50:40 [aqs1001.eqiad.wmnet] sudo_check_call sudoing as deploy-service [19:51:01] marxarelli: I wonder if running: sudo -u deploy-service -- deploy-local -v ...etc would screw it up. [19:51:11] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052534 (10mmodell) a:3mmodell [19:51:22] thcipriani: i just created https://phabricator.wikimedia.org/D134 [19:51:28] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052522 (10mmodell) p:5Triage>3Normal [19:51:57] thcipriani: to avoid that particular kind of self-referential sudo madness [19:52:33] can't say what the exact underlying problem is in this case though, assuming sudoers allows deploy-server to sudo as itself [19:52:49] (03PS4) 10Hashar: [BlueSpiceExtensions] Add npm test, Also add mw-check-test which replaced mw-checks [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:52:50] marxarelli: nice, probably would be nice to get that into a package as a bug-fix [19:52:56] thcipriani: hmmmm yes, and it should do that [19:53:01] with just sudo to root for restart [19:53:25] OH [19:53:25] deploy-service ALL=(root) NOPASSWD: /usr/sbin/service analytics/aqs/deploy * [19:53:27] that is not right [19:53:31] (03PS5) 10Hashar: [BlueSpiceExtensions] Add npm test [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:53:33] analytics/aqs/deploy is not a service name. [19:53:35] looking [19:53:46] (03CR) 10Hashar: [C: 032] "I have rephrased slightly the commit message" [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:53:48] paladox: landing it [19:54:29] there is other weirdness, but i thikn i have puppet patch to fix this... [19:54:33] (03CR) 10Paladox: "Thanks. And sorry I linked to the wrong place. I thought I linked to the right patch but I didn't, sorry." [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:54:41] (03Merged) 10jenkins-bot: [BlueSpiceExtensions] Add npm test [integration/config] - 10https://gerrit.wikimedia.org/r/270734 (owner: 10Paladox) [19:55:38] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/270712/ if you have time please, Its to do with cleaning the skins directory on extensions tests since it causes problems for other extensions since skins doint follow the same guidelines as extensions have to do. [19:55:50] paladox: deployed! can you check changes on BlueSpiceExtensions still work fine with npm ? :) [19:56:17] paladox: yeah I have seen your skins related changes. Havent had time to properly review/test that one though [19:56:17] hashar: Ok thanks and i will do that now. [19:56:25] hashar: Oh ok. [19:56:57] hashar: Im testing here https://gerrit.wikimedia.org/r/#/c/272344/ [19:57:24] paladox: you can get an idea of my review backlog by browsing https://gerrit.wikimedia.org/r/#/q/is:open+reviewer:hashar+label:code-review%253D0%252Chashar+NOT+owner:hashar,n,z :( [19:57:46] hashar: Oh. [19:57:48] paladox: yeah testing on the last merged change is usually a good idea [19:58:05] paladox: but l10n-bot patches are ignored :D [19:58:24] oh no [19:58:34] hashar: Oh but when i run recheck it rechecks the patch. [19:58:35] thcipriani: https://gerrit.wikimedia.org/r/#/c/272527/ [19:59:00] paladox: yeah because that is you doing the recheck so you are the author of the action, not l10n-bot [19:59:05] hashar: Its passes npm. Jsonlint also passes. [19:59:09] hashar: Oh. [19:59:20] paladox: congratulations! [19:59:52] hashar: Thanks. Im fixing up the extensions unit tests, but i have to add the missing api messages. [20:01:13] ottomata: that looks right. [20:04:24] PROBLEM - Puppet failure on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:04:30] paladox: yeah those missing API documentations are slightly painful to handle :-( [20:05:22] hashar: Yep. Ive got a patch here https://gerrit.wikimedia.org/r/#/c/270478/ i did have the unit tests working a few months ago then the api error related messages started apperaring. [20:06:21] paladox: yup that is a test in mediawiki/core that got merged a few weeks ago [20:06:22] HeyYYYy thcipriani it worked! [20:06:32] dunno what was up with those other problems with mkdir and ln [20:06:35] but it worked! [20:06:40] ottomata: awesome! [20:06:41] milimetric: can you confirm [20:06:43] on aqs1001? [20:06:55] * milimetric doing [20:07:28] we've got a patch to fix the sudo-as-self thing already merged into scap. As soon as the package is rebuilt and put in apt, you shouldn't need sudo -u [self] to deploy. [20:07:53] cool [20:08:58] ottomata: hm... I'm not seeing the new code, but I can hit the service [20:09:04] (but I was hitting it before too) [20:09:59] ostriches: I will update the Wikidata submodule in wmf.14 . in this case, as you will will deploy to group0 tomorrow: should I merge that without deploying or only with deployment? [20:10:35] I doubt we'll deploy tomorrow actually. [20:11:14] ok milimetric [20:11:16] looking [20:11:26] the only thing I see as changed on the target is this new symlink: [20:11:27] ccfb3fd8feda1552e552c282614a0a124369443a -> revs/ccfb3fd8feda1552e552c282614a0a124369443a [20:11:44] could it be because .gitmodules has some local changes? [20:12:37] yeah [20:12:48] that symlink wa smy fault [20:12:51] i just removed it [20:13:00] milimetric: don't think so, they sha is still old [20:13:07] oh! [20:13:08] no its not [20:13:10] hmm, [20:13:10] yeah [20:13:11] thcipriani: [20:13:23] the symlink is being created in current/ [20:13:24] on deploy [20:13:26] instead of replacing it [20:13:35] [@aqs1001:/srv/deployment/analytics/aqs/deploy] (12d439e+1/-1)[12d439e] ± ls -l [20:13:35] total 24 [20:13:35] lrwxrwxrwx 1 deploy-service deploy-service 45 Feb 22 20:12 ccfb3fd8feda1552e552c282614a0a124369443a -> revs/ccfb3fd8feda1552e552c282614a0a124369443a [20:15:18] that's ....? ok, looking. [20:15:42] naw [20:15:43] could you re-run deploy on tin in the interim? I don't know how that could happen. [20:15:44] its not updateing current [20:15:46] yes [20:15:52] you watching logs? [20:16:11] deploying now thcipriani [20:16:25] no, unfortunately that sudo thing prevents me from doing deploy-log on tin [20:16:48] ha [20:16:52] oh hm [20:16:56] hmmm [20:17:00] it does say [20:17:00] 20:16:14 [aqs1001.eqiad.wmnet] Revision directory already exists (use --force to override) [20:17:24] oh yeah, run deploy --force [20:17:29] ok, i mean, its true [20:17:34] the rev/ dir does exist [20:17:42] but, hm [20:17:42] if it detects that it's trying to deploy the same revision, it doesn't try. [20:17:43] ok trying [20:17:49] but it shoudln't be [20:17:54] deploy points at the old rev [20:18:02] deploy -> deploy-cache/revs/12d439e0cfb3b9ed3ac31d84f6b38112c57a370f [20:18:06] on tin [20:18:11] we are at [20:18:12] ccfb3fd [20:18:19] trying force anyway [20:18:37] no change [20:18:38] deploy -> deploy-cache/revs/12d439e0cfb3b9ed3ac31d84f6b38112c57a370f [20:19:17] $ ls deploy-cache/revs [20:19:17] 12d439e0cfb3b9ed3ac31d84f6b38112c57a370f ccfb3fd8feda1552e552c282614a0a124369443a [20:19:22] ccfb3fd is ther [20:19:24] e [20:19:28] but the symlink isn't being updated [20:19:33] because its creating it inside of current/ [20:19:35] instead of replacing current [20:20:43] thcipriani: https://gist.github.com/ottomata/515d8a534d34f677d024 [20:20:59] if the second arg in ln is a dir [20:21:03] the symlink is created inside the dir with the filename of the first arg [20:21:06] even if the second arg is a symlink to a dir [20:21:53] i think you want -F [20:21:54] instead of -f [20:22:08] hmm maybe not [20:22:35] ostriches: so if wmf.14 doesn't get deployed to the full group0 tomorrow, what happens instead regarding the train? and irrespective of that should I change wmf.14 with or without deploying? [20:22:42] -h? [20:22:54] [:/tmp] 1 $ ln -shfv b current [20:22:54] current -> b [20:22:56] yeah [20:23:11] oh foo [20:23:13] that is on bsd, nm [20:24:06] -n [20:24:12] [@aqs1001:/tmp] $ ln -svnf b current [20:24:12] ‘current’ -> ‘b’ [20:24:58] hmm [20:25:06] or -T [20:25:07] ? [20:25:16] ja -T [20:25:32] which I htink you have [20:26:26] ? [20:26:31] https://github.com/wikimedia/scap/blob/603ef0c10654750212f87014c8ed898f44170365/scap/utils.py#L591 [20:26:59] ohh thcipriani maybe this is just fixed in master? [20:27:06] jzerebecki: Still trying to noodle that one [20:27:07] I see -T in master [20:27:37] ? [20:28:10] uhh [20:28:14] ottomata: I don't think -T is in master [20:28:15] thcipriani: do you guys not use gerrit? [20:28:25] i just cloned mediawiki/tools/scap from gerrit [20:28:26] and it has -T [20:28:39] nah, we use differential [20:29:40] oh! [20:29:44] uhh, so where should I clone from? [20:29:55] somehow your -T was lost! :) [20:30:13] https://phabricator.wikimedia.org/diffusion/MSCA/scap.git [20:31:34] 10Continuous-Integration-Infrastructure, 10scap, 10Packaging: Develop a CI Testing and Release pipeline for the SCAP package (and potentially other debian packages that we maintain) - https://phabricator.wikimedia.org/T127741#2052737 (10hashar) I highly recommend http://jenkins-debian-glue.org/ which I have... [20:32:52] thcipriani: https://phabricator.wikimedia.org/rMSCA55ee398f20b29b9e61a0ebd537151182bf2bfc33 [20:33:11] tasks.py line 602 [20:33:21] replaced in utils.py line 567 [20:33:24] without -T [20:36:22] marxarelli explain yoself?!?!! :) [20:38:51] ottomata: got a patch up. I can't remember if we discussed the -T flag or not on that patch. Maybe just lost in translation, I don't know. [20:39:10] ottomata, thcipriani could have sworn that was fixed elsewhere [20:39:29] RECOVERY - Puppet failure on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [20:39:41] that was my memory, too :\ [20:40:49] we might have fixed it in a non-obvious place and then the fix was removed at some point. [20:43:03] thcipriani: should I just manually fix the symlink for now? [20:43:08] or maybe..hehhe edit the code and add -T? :p [20:44:52] yeah, manually fixing is probably the right thing for now. I have the feeling that this came up and was fixed, but was maybe unfixed. We'll sort that and update the scap package. [20:45:14] ok [20:46:03] milimetric: try now [20:46:05] ottomata, thcipriani: i think i know what happened there. i was trying to refactor the `ln` command to work cross platform so it could be tested. when that failed, i ended up just limiting the tests to linux but must have forgot to change the command back to using `-T` (gnu ls only) :. [20:46:06] ottomata: hey it works [20:46:07] :/ [20:46:09] heh, I was just trying [20:46:44] k, ottomata let's re-pool this and do the others? [20:47:24] (moving to -analytics) thanks for your help thcipriani [20:47:27] yay! ok milimetric great. [20:47:36] ja lemme clean this one of restbase stuff [20:47:39] k [20:48:07] milimetric: thank you for bearing with me :P [20:49:00] oh, no bearing was involved, we knew this was a bit new and it wasn't gonna be smooth, happy to help contribute to your bug backlog :P [20:50:43] heh, well, much appreciated :) [20:57:22] (03Abandoned) 10Paladox: [BlueSpiceExtensions] Change test mw-checks to mw-checks-test [integration/config] - 10https://gerrit.wikimedia.org/r/270659 (owner: 10Paladox) [21:06:49] thcipriani: , marxarelli ok i have edited utils.py manually on each aqs box [21:06:58] that seems to work [21:07:06] need a fix for next release though! [21:07:31] ottomata: ack. That change is merged in scap and will be in next packaged version. Thanks for your help! [21:07:37] oh cool [21:07:38] ottomata: we have a couple of fixes and are tagging for the next package release [21:07:38] great! [21:07:42] ok awesome [21:07:42] thank you [21:16:03] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #184: 04FAILURE in 1.9 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/184/ [21:17:14] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/270474/ if you have time please. Its about updating the MobileFrontend tests. [21:18:57] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 264 bytes in 0.006 second response time [21:20:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 264 bytes in 0.005 second response time [21:20:48] 404s on mediawiki [21:20:55] cannot edit [21:21:16] ^^ greg-g [21:22:29] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 634 bytes in 0.013 second response time [21:25:12] jdlrobson: See -operations. [21:27:32] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31301 bytes in 1.125 second response time [21:29:58] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 705 bytes in 0.003 second response time [21:31:48] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053017 (10hashar) [21:31:54] thanks James_F [21:33:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 40330 bytes in 0.624 second response time [21:34:58] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39885 bytes in 0.995 second response time [21:35:32] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 40348 bytes in 0.774 second response time [21:37:42] 3Scap3, 10scap: Implement MediaWiki pre-promote checks - https://phabricator.wikimedia.org/T121597#2053065 (10Krinkle) Here are a few incidents that a scap with sanity-check of even just `https://en.wikipedia.org/wiki/Main_Page` against tin or a canary locally would've prevented: * [20151005-MediaWiki](https:... [22:00:44] bd808: who shoudl I ping about puppet being broken on deployment-cache*nodes? [22:00:51] getting [22:00:53] 'invalid byte sequence in US-ASCII at /etc/puppet/modules/role/manifests/cache/text.pp:1' [22:01:13] ottomata: hmmm... maybe thcipriani and ostriches ? [22:01:21] that sounds like a puppet bug though [22:01:29] I've seen it somewhere before [22:01:42] that's something that only is a problem with the puppet version we run on beta for some reason... [22:01:56] I've run into it before, fixed it with iconv IIRC [22:02:18] https://projects.puppetlabs.com/issues/20897#note-10 [22:02:20] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053150 (10hashar) Using the poor man debugger on labnodepool1001.eqiad.wmnet as nodepool user: `strace -f -e recvfrom,sendto -s 102... [22:02:45] "Configure your Apache server to run in a locale that supports non-ASCII characters. Note that this may affect all processes spawned by Apache." [22:03:28] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053156 (10hashar) python-novaclient has not been updated on labnodepool. Maybe it should... No idea really ``` labnodepool1001:~$... [22:03:50] OR [22:03:50] t Encoding.default_external = Encoding::UTF_8 in the config.ru file for the puppetmaster as suggested by the patch linked to this ticket. This default should be backwards-compatible with the US-ASCII encoding currently inherited from the C locale. [22:03:51] ? [22:06:44] !log Restarted puppetmaster service on deployment-puppetmaster to "fix" error "invalid byte sequence in US-ASCII" [22:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:06:52] ooo :) [22:06:57] ottomata: got past the error by restarting puppetmaster [22:07:02] now new errors [22:07:17] looks like some missing hiera data [22:07:25] at least on text04 [22:09:20] yea [22:09:20] hm [22:09:35] bd808: did you change anything on the deployment-puppetmaster to "fix" the error? [22:09:46] (your quotes :P) [22:09:59] thcipriani: just a restart from a shell that had a unicode locale set [22:10:17] I'm writing a patch for it now [22:10:32] based on https://tickets.puppetlabs.com/browse/PUP-1386?focusedCommentId=62325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-62325 [22:10:34] <% if scope.function_hiera(["cluster"]) != "cache_parsoid" %> [22:14:05] ah bd808 its because you can't use role lookup in labs, i think [22:14:11] unless you are including the class somehow via the role keyword [22:14:13] which it isn't [22:14:18] its just included via the web gui [22:14:21] wikitech [22:14:27] hmmm [22:15:24] yeah role lookup doesn't work in beta cluster. You have to add things to the hiera files in hieradata/labs/deployment-prep or use the wiki page at https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [22:15:47] yeah, but hmmm, but it needs to be declared at role scope [22:15:49] hmm [22:15:51] i guess [22:15:52] hm [22:15:52] maybe [22:15:59] role::cache::text::cluster: cache_text [22:16:01] maybe would work [22:16:06] yeah that should work [22:16:06] gonna try in wikitech hiera [22:17:55] 10Deployment-Systems, 3Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2053246 (10thcipriani) [22:20:08] hmm, doesn't seem to work [22:20:12] which maybe makes sense [22:20:25] i think the role keyword is doing something special to put the variables in scope. there is no parameter on the class [22:20:31] i dunno why these need to be in hiera anyway [22:20:43] i would think role:;cache::text would alwyas have cluster = 'cache_text' [22:21:10] 10Deployment-Systems, 3Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2053283 (10thcipriani) [22:22:30] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<11.11%) [22:27:41] PROBLEM - Puppet failure on deployment-aqs01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [22:32:31] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [22:36:21] bd808: needs brandon's eyes, but: https://gerrit.wikimedia.org/r/#/c/272619/ [22:37:53] ottomata: +1 if only for killing a function_hiera lookup [22:38:11] * bd808 hates to see hiera functions in code [22:38:44] ostriches: did his noodliness enlighten you regarding the deploy tomorrow? [22:40:13] jzerebecki: I'd go ahead and do your backport to wmf.14 and sync it. It'll only affect testwiki for now. [22:40:26] wmf.14 is still being held this week, so it wouldn't go out until next week at the earliest otherwise. [22:40:37] (wmf.13 is still live on the other 880something wikis) [22:40:56] ok thx [22:47:12] i'll postpone that for tomorrow, though [22:57:31] (03PS1) 10Paladox: Replace jslint test with jshint and jsonlint tests [integration/config] - 10https://gerrit.wikimedia.org/r/272629 (https://phabricator.wikimedia.org/T127362) [22:58:39] 10Continuous-Integration-Config, 13Patch-For-Review: Switch passing mwext-*-jslint jobs to jshint/jsonlint - https://phabricator.wikimedia.org/T127362#2053459 (10Paladox) @Legoktm could you update the description please with the recent changes and include https://gerrit.wikimedia.org/r/27262/ please so we can... [23:00:14] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool, 13Patch-For-Review: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053461 (10hashar) Not yet .. ``` lang=json GET /v2/contintcloud/images/2e45de58-b560-4d51-a4b3-3a20b7f47dde H... [23:01:11] (03PS1) 10Paladox: [Limn] Archive repo [integration/config] - 10https://gerrit.wikimedia.org/r/272631 (https://phabricator.wikimedia.org/T127362) [23:02:33] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool, 13Patch-For-Review: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053463 (10hashar) If I try to create a server image from a running server it works just fine, i.e.: openstac... [23:03:12] 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure: Make labs wikitech role aware - https://phabricator.wikimedia.org/T127771#2053464 (10Ottomata) [23:12:07] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure, 7Nodepool, 13Patch-For-Review: Nodepool can't refresh snapshot on labs since ~ Feb 15th - https://phabricator.wikimedia.org/T127755#2053505 (10hashar) `openstack server image create` does not work. The command returns immediately showing the... [23:22:51] Krinkle: Phpunit is failing in non voting extension unit tests. Please see https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-non-voting/54/console and https://integration.wikimedia.org/ci/job/mwext-testextension-php55-non-voting/54/console [23:23:02] legoktm: ^^ too please. [23:23:38] The non-voting job needs to be recompiled [23:23:49] the macro has been updated already but this job wasn't regenerated yet [23:23:53] maybe someone accidentally reverted it [23:24:06] Krinkle: I found the problem [23:24:21] It was because it wasent updated with the composer one as the generic one did. [23:24:42] (03PS2) 10Paladox: Replace jslint test with jshint and jsonlint tests [integration/config] - 10https://gerrit.wikimedia.org/r/272629 (https://phabricator.wikimedia.org/T127362) [23:25:54] (03PS1) 10Paladox: Add mw-fetch-composer-dev to 'mwext-testextension-{phpflavor}-non-voting' [integration/config] - 10https://gerrit.wikimedia.org/r/272634 [23:26:26] Krinkle: Could you review https://gerrit.wikimedia.org/r/#/c/272634/ please. It fixes the problem with non voting test. [23:27:19] 10Deployment-Systems, 3Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2053525 (10dduvall) p:5Triage>3High [23:27:36] (03CR) 10Krinkle: [C: 031] Add mw-fetch-composer-dev to 'mwext-testextension-{phpflavor}-non-voting' [integration/config] - 10https://gerrit.wikimedia.org/r/272634 (owner: 10Paladox) [23:42:03] Krinkle: Would this mw-fetch-composer-dev also need to be added to 'mwext-testextension-{phpflavor}-composer' or not [23:42:29] paladox: No, that one already contains full composer handling which includes both regular packages and dev packages. [23:42:42] So it doesn't need separate fetching of dev packages [23:43:00] Krinkle: Ok thanks for replying. [23:43:00] By default running 'composer install' or 'composer update' fetches all packages in composer.json [23:43:13] paladox: We use 'mediawiki/vendor' for security reasons, which contains a fixed set of non-dev packages. [23:43:27] So in CI we use mediawiki/vendor + manually fetch dev composer packages on top of it [23:43:32] Krinkle: Oh, Ok. [23:43:41] But jobs that don't use mediawiki/vendor and just use regular composer, they get all packages automatically [23:43:54] paladox: Does that make sense? [23:44:02] Krinkle: Yes. [23:44:04] Thanks.