[01:53:57] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016: Weekly Reports for Improving an static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T134225#2258525 (10Lethexie) [01:54:07] 10Beta-Cluster-Infrastructure: deployment-parsoid06 parsoid fails due to having role::parsoid::beta (requiring upstart) on jessie - https://phabricator.wikimedia.org/T134226#2258540 (10Krenair) [02:05:51] 10Beta-Cluster-Infrastructure: deployment-parsoid06 parsoid fails due to having role::parsoid::beta (requiring upstart) on jessie - https://phabricator.wikimedia.org/T134226#2258591 (10Krenair) [03:12:25] Yippee, build fixed! [03:12:26] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1063: 09FIXED in 30 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1063/ [03:18:55] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Creating wiki at beta cluster for the Dutch Wikipedia - https://phabricator.wikimedia.org/T118005#2258684 (10Mtherwjs) 05Open>03Resolved [03:33:27] !log Deleted deployment-cxserver03, replaced by deployment-sca0x [03:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [03:38:54] 10Beta-Cluster-Infrastructure, 10ContentTranslation-Deployments, 10Parsoid, 06Services, 03Language-Q4-2016-Sprint 2: Migrate BetaCluster Node.JS services to Jessie and Node 4.3 - https://phabricator.wikimedia.org/T125003#2258724 (10KartikMistry) [03:39:07] 10scap, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 3 others: Deploy CXServer with scap3 - https://phabricator.wikimedia.org/T120104#2258725 (10KartikMistry) [03:41:14] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Creating wiki at beta cluster for the Dutch Wikipedia - https://phabricator.wikimedia.org/T118005#2258726 (10Krenair) 05Resolved>03Open @Mtherwjs: Stop doing that, this still has an open patch, it's not completely set up [04:09:43] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Creating wiki at beta cluster for the Dutch Wikipedia - https://phabricator.wikimedia.org/T118005#2258734 (10Mtherwjs) 05Open>03Resolved [04:34:23] PROBLEM - Puppet run on deployment-phab is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:04:51] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:55:35] Project beta-scap-eqiad build #100992: 04FAILURE in 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100992/ [06:05:35] Project beta-scap-eqiad build #100993: 04STILL FAILING in 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100993/ [06:10:20] (03PS2) 10Hashar: Migrate remaining composer jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) [06:10:48] (03CR) 10Hashar: [C: 031] "Only failures were for AWS, AWSSDK and GoogleCustomWikiSearch" [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [06:15:35] Project beta-scap-eqiad build #100994: 04STILL FAILING in 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100994/ [06:25:58] Yippee, build fixed! [06:25:59] Project beta-scap-eqiad build #100995: 09FIXED in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/100995/ [06:41:00] (03CR) 10Paladox: [C: 031] Migrate remaining composer jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [07:03:18] (03CR) 10Hashar: [C: 032] Migrate remaining composer jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [07:03:47] (03CR) 10Hashar: "I have fixed the few repos that were falling :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [07:03:59] (03Merged) 10jenkins-bot: Migrate remaining composer jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286462 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [07:10:44] 10Beta-Cluster-Infrastructure, 10Parsoid: deployment-parsoid06 parsoid fails due to having role::parsoid::beta (requiring upstart) on jessie - https://phabricator.wikimedia.org/T134226#2258846 (10hashar) Looks like production is still on Trusty as well. In puppet one will have to change to `base::service_unit... [07:16:58] mobrovac: we just need to merge Puppet patch for scap3, right (ie https://gerrit.wikimedia.org/r/#/c/286395/)? [07:17:42] yes kart_, and start deploying with scap3, ofc [07:18:16] btw kart_, i forgot to tell you regarding the changes we did yesterday in deployment-prep [07:18:34] from now on you will need to manually deploy cxserver there as well [07:18:51] the automatic beta update job will not be functional any more [07:19:05] mobrovac: I see. Can it be fixed? [07:19:20] not right away [07:19:28] (03PS1) 10Hashar: [labs/tools/heritage] drop HHVM job [integration/config] - 10https://gerrit.wikimedia.org/r/286590 (https://phabricator.wikimedia.org/T134207) [07:19:31] Okay! Thanks for update. [07:19:41] well, you can move cxserver to another host and then talk to hashar [07:20:07] mobrovac: I'll schedule puppet patch for today's PuppetSWAT. [07:20:21] kk [07:20:30] mobrovac: that's fine. I would like to be closer to Production. [07:22:28] kart_: euh, no, actually let's not schedule it for puppetswat [07:22:35] this is not the kind of thing it is meant for [07:22:56] moving from trebuchet to scap3 hasn't been battle-tested fully yet [07:23:07] let's wait for alex to come around [07:23:09] (03CR) 10Hashar: [C: 032] "https://integration.wikimedia.org/ci/job/integration-zuul-layoutdiff/9592/console shows the hhvm jobs are removed (the diff is a bit ann" [integration/config] - 10https://gerrit.wikimedia.org/r/286590 (https://phabricator.wikimedia.org/T134207) (owner: 10Hashar) [07:23:50] (03Merged) 10jenkins-bot: [labs/tools/heritage] drop HHVM job [integration/config] - 10https://gerrit.wikimedia.org/r/286590 (https://phabricator.wikimedia.org/T134207) (owner: 10Hashar) [07:24:39] mobrovac: OK. reverting. [07:39:03] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2258877 (10hashar) [07:39:05] 05Continuous-Integration-Scaling, 13Patch-For-Review: wikimedia/slimapp fails composer-package-php55-trusty - https://phabricator.wikimedia.org/T134177#2258876 (10hashar) 05Open>03Resolved [07:59:24] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2258903 (10jayvdb) [08:32:14] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2258903 (10hashar) Sure thing! Debian Jessie comes with pypy 2.4.0 would it be sufficient? [08:35:09] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #959: 04STILL FAILING in 25 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/959/ [08:38:11] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2258986 (10hashar) [08:38:22] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2258903 (10hashar) p:05Triage>03Normal a:03hashar [08:38:53] mobrovac: I can update cxserver in Production using old method until Puppet change is merged, right? [08:42:43] yes kart_ [08:44:28] cool. [08:44:48] mobrovac: we anyway need scap/ folder changes in Production. [08:44:51] :) [08:45:26] oh? [08:45:31] hashar: postmerge for cxserver-beta is stuck, because deployment-cxserver03 no longer exists and moved to deployment-sca0x [08:45:47] (03CR) 10Addshore: [C: 031] Enable basic CI for the purtle library. [integration/config] - 10https://gerrit.wikimedia.org/r/286469 (https://phabricator.wikimedia.org/T134162) (owner: 10Daniel Kinzler) [08:46:00] mobrovac: ie https://gerrit.wikimedia.org/r/#/c/286400/ [08:46:22] mobrovac: changes in deploy repo for scap3 migration. [08:46:44] no, you don't need them on the targets kart_, only on tin [08:46:50] to deploy with scap3, that is [08:48:54] OK! [08:49:32] mobrovac: so far, we do, git pull && git submodule update in Production before deploy. So, that'll be fetched automatically too. [08:49:43] (03PS1) 10Hashar: Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 [08:50:02] kart_: euh, no, you still have to do that because you're the one deciding what to deploy [08:50:05] the system can't know that [08:50:59] kart_: the stuck jenkins job takes care of doing the git fetch && git checkout && service restart cxserver (or whatever) [08:51:03] indeed, we gotta move it ;-} [08:51:23] ah [08:52:16] which host is it running on ? [08:52:25] deployment-scXX something? [08:52:49] (03CR) 10Hashar: [C: 032] Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 (owner: 10Hashar) [08:52:51] hashar: deployment-sca01 [08:53:10] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2259015 (10hashar) [08:53:22] (03PS2) 10Hashar: Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 (https://phabricator.wikimedia.org/T119139) [08:53:39] (03CR) 10Hashar: Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [08:53:44] (03CR) 10Hashar: [C: 032] Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [08:55:35] (03Merged) 10jenkins-bot: Drop composer jobs from permanent slaves [integration/config] - 10https://gerrit.wikimedia.org/r/286602 (https://phabricator.wikimedia.org/T119139) (owner: 10Hashar) [09:03:10] (03PS1) 10Hashar: dib: cache a few more popular/heavy repos [integration/config] - 10https://gerrit.wikimedia.org/r/286605 [09:04:22] kart_: you could reuse the CI Change https://gerrit.wikimedia.org/r/#/c/286468/ [09:04:39] kart_: there should be some occurrences of the old server, to be replaced with deployment-sca01 [09:05:03] and do you have a task about that ? ;-) [09:05:11] (03CR) 10Legoktm: dib: cache a few more popular/heavy repos (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/286605 (owner: 10Hashar) [09:05:44] (03CR) 10Hashar: dib: cache a few more popular/heavy repos (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/286605 (owner: 10Hashar) [09:06:27] (03PS2) 10Hashar: dib: cache a few more popular/heavy repos [integration/config] - 10https://gerrit.wikimedia.org/r/286605 [09:07:36] hashar: Let me add task. [09:08:36] it is easier to keep log of actions this way [09:08:51] (03CR) 10Hashar: [C: 032] dib: cache a few more popular/heavy repos [integration/config] - 10https://gerrit.wikimedia.org/r/286605 (owner: 10Hashar) [09:13:30] (03PS1) 10Hashar: dib: syslog user is no more needed for HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/286610 [09:16:41] (03CR) 10Hashar: [C: 032] dib: syslog user is no more needed for HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/286610 (owner: 10Hashar) [09:17:31] hashar: https://phabricator.wikimedia.org/T134239 - feel free to retitle. [09:17:34] (03Merged) 10jenkins-bot: dib: syslog user is no more needed for HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/286610 (owner: 10Hashar) [09:17:44] (03PS3) 10Hashar: Enable basic CI for the purtle library. [integration/config] - 10https://gerrit.wikimedia.org/r/286469 (https://phabricator.wikimedia.org/T134162) (owner: 10Daniel Kinzler) [09:18:05] (03CR) 10Hashar: [C: 032] "Will most probably need a link to be added in integration/docroot.git org/wikimedia/doc/default.html" [integration/config] - 10https://gerrit.wikimedia.org/r/286469 (https://phabricator.wikimedia.org/T134162) (owner: 10Daniel Kinzler) [09:18:27] (03PS2) 10KartikMistry: Fix cxserver CI config [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) [09:18:46] (03Merged) 10jenkins-bot: Enable basic CI for the purtle library. [integration/config] - 10https://gerrit.wikimedia.org/r/286469 (https://phabricator.wikimedia.org/T134162) (owner: 10Daniel Kinzler) [09:18:59] kart_: in that CI config file, line 207 there is node: deployment-cxserver-eqiad [09:19:15] that is how Jenkins assign the job to run on a specific host [09:19:20] will handle it [09:19:29] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2259116 (10Legoktm) [09:20:10] (03CR) 10Hashar: [C: 04-1] Fix cxserver CI config (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) (owner: 10KartikMistry) [09:23:48] (03PS3) 10KartikMistry: Fix cxserver config file [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) [09:24:34] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2256271 (10hashar) Did a `recheck` on https://gerrit.wikimedia.org/r/#/c/286407/ but that fails php53. I guess you only target Zen... [09:24:47] hashar: done [09:24:49] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2259136 (10hashar) a:03daniel [09:25:03] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2256271 (10hashar) [09:25:05] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2256271 (10hashar) [09:26:07] !log Applying puppet class role::ci::slave::labs::common on deployment-sca01 and deployment-sca02 (cxserver and parsoid being migrated T134239 ) [09:26:08] T134239: Update cxserver jenkins job for Beta - https://phabricator.wikimedia.org/T134239 [09:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:26:57] kart_: neat. I will add both sca01 and sca02 as Jenkins slaves [09:28:16] !log deployment-sca01 removing puppet lock /var/lib/puppet/state/agent_catalog_run.lock and running puppet again [09:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:31:23] !log Deleting CI slave deployment-cxserver03 , added deployment-sca01 and deployment-sca02 in Jenkins. T134239 [09:31:23] T134239: Update cxserver jenkins job for Beta - https://phabricator.wikimedia.org/T134239 [09:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:32:30] Project beta-cxserver-update-eqiad build #271: 04FAILURE in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/271/ [09:32:30] Project beta-cxserver-update-eqiad build #272: 04STILL FAILING in 0.11 sec: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/272/ [09:32:31] Project beta-cxserver-update-eqiad build #273: 04STILL FAILING in 78 ms: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/273/ [09:32:36] kart_: ^^^:-} [09:32:39] that unlocked it [09:32:51] 00:00:00.041 /tmp/hudson2985211184730424886.sh: line 3: /srv/deployment/integration/slave-scripts/bin/multigit.sh: No such file or directory [09:32:52] bah [09:33:49] :/ [09:34:04] kart_: will fix it ;-} [09:34:20] cool. [09:35:19] I dont even remember what that multigit.sh script does :( [09:42:32] !log adding puppet class contint::slave_scripts to deployment-sca01 and deployment-sca02 . Ships multigit.sh T134239 [09:42:33] T134239: Update cxserver jenkins job for Beta - https://phabricator.wikimedia.org/T134239 [09:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:43:53] Project beta-cxserver-update-eqiad build #274: 04STILL FAILING in 8.3 sec: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/274/ [09:43:59] blalblblb [09:43:59] https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/274/console [09:44:11] rsync fails somehow [09:44:19] mkdir "/srv/deployment/cxserver/cxserver" failed: Permission denied (13) [09:44:20] :D [09:44:36] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2259231 (10Lydia_Pintscher) [09:45:42] kart_: do you have any idea how /srv/deployment/cxserver/deploy/ got provisioned ? [09:46:31] it belong to root:root [09:46:40] but the jenkins job runs as user jenkins-deploy [09:47:06] (03PS4) 10Hashar: Fix cxserver config file [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) (owner: 10KartikMistry) [09:47:35] (03CR) 10Hashar: [C: 032] Fix cxserver config file [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) (owner: 10KartikMistry) [09:48:58] Notice: /Stage[main]/Cxserver/Service::Node[cxserver]/Service::Deploy::Trebuchet[cxserver/deploy]/Package[cxserver/deploy]/ensure: ensure changed 'purged' to 'present' [09:48:59] :D [09:49:09] (03Merged) 10jenkins-bot: Fix cxserver config file [integration/config] - 10https://gerrit.wikimedia.org/r/286468 (https://phabricator.wikimedia.org/T134239) (owner: 10KartikMistry) [09:53:37] Project beta-cxserver-update-eqiad build #275: 04STILL FAILING in 2.7 sec: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/275/ [09:54:07] 00:00:02.744 ln: failed to create symbolic link ‘/srv/deployment/cxserver/cxserver/config.yaml’: File exists [09:54:07] :D [09:56:42] hashar: ok. We can remove it. [09:56:59] the link? [09:57:02] yep [09:57:27] updating [09:58:01] Yippee, build fixed! [09:58:02] Project beta-cxserver-update-eqiad build #276: 09FIXED in 5.3 sec: https://integration.wikimedia.org/ci/job/beta-cxserver-update-eqiad/276/ [09:58:28] 9212 ? Sl 0:01 \_ /usr/bin/nodejs src/server.js -c /etc/cxserver/config.yaml [09:58:28] 9222 ? Sl 0:01 \_ /usr/bin/nodejs /srv/deployment/cxserver/deploy/src/server.js -c /etc/cxserver/config.yaml [09:58:48] kart_: it started [09:58:54] no clue what that conf file is though [09:59:11] it seems it is managed by puppet [09:59:33] (03PS1) 10Hashar: cxserver config is no more in puppet [integration/config] - 10https://gerrit.wikimedia.org/r/286615 (https://phabricator.wikimedia.org/T134239) [09:59:47] (03CR) 10Hashar: [C: 032] cxserver config is no more in puppet [integration/config] - 10https://gerrit.wikimedia.org/r/286615 (https://phabricator.wikimedia.org/T134239) (owner: 10Hashar) [10:00:47] (03Merged) 10jenkins-bot: cxserver config is no more in puppet [integration/config] - 10https://gerrit.wikimedia.org/r/286615 (https://phabricator.wikimedia.org/T134239) (owner: 10Hashar) [10:02:09] hashar: looks good. [10:02:14] hashar: let me test. [10:02:31] (03PS1) 10Hashar: [purtle] no need for Zend 5.3 [integration/config] - 10https://gerrit.wikimedia.org/r/286617 (https://phabricator.wikimedia.org/T134162) [10:02:50] hashar: cxserver config file is in puppet. [10:03:05] we used to have it in one of the repo for sake of simplicity maybe [10:03:12] it is entirely up to you ;-} [10:03:16] puppet is probably fine [10:03:40] (03CR) 10Hashar: [C: 032] "confirmed by addshore on IRC" [integration/config] - 10https://gerrit.wikimedia.org/r/286617 (https://phabricator.wikimedia.org/T134162) (owner: 10Hashar) [10:04:08] OK! [10:04:14] Thanks a lot, hashar! [10:04:20] kart_: time to kill the task ;-) [10:04:29] kart_: note the jenkins job can only deploy to a single host [10:04:33] so that is stuck to sca01 for now [10:04:38] Noted. [10:04:40] whenever we get it migrated to scap3 [10:04:47] we will be able to run scap deploy from deployment-tin [10:04:53] and push to multiple instances [10:05:01] the job does not exist yet though [10:05:05] (03Merged) 10jenkins-bot: [purtle] no need for Zend 5.3 [integration/config] - 10https://gerrit.wikimedia.org/r/286617 (https://phabricator.wikimedia.org/T134162) (owner: 10Hashar) [10:05:07] hashar: cool [10:07:58] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2259392 (10hashar) CI happy https://gerrit.wikimedia.org/r/#/c/286407/ Once that change merge, we will have to verify the doc is p... [10:15:26] 10Continuous-Integration-Config, 07TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#2254300 (10hashar) A lot of those failures are due to the MediaWiki core structure test ApiDocumentationTest :( [10:17:25] 06Release-Engineering-Team, 06Developer-Relations, 06Team-Practices: Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2259501 (10Qgil) [10:35:08] lunch & [10:46:13] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 10Wikidata: Add email notification for aborted wikidata browser tests jobs - https://phabricator.wikimedia.org/T128067#2259552 (10Lydia_Pintscher) [10:48:14] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2259560 (10adrianheine) [10:48:39] 10Browser-Tests-Infrastructure, 10Wikidata, 13Patch-For-Review: Merge tests/browser/environments.yml and tests/browser/config/config.yml in WikidataBrowserTests - https://phabricator.wikimedia.org/T128097#2259561 (10Lydia_Pintscher) [10:52:14] RECOVERY - Host integration-dev is UP: PING OK - Packet loss = 0%, RTA = 357.89 ms [10:57:44] (03CR) 10Thiemo Mättig (WMDE): "The Purtle component is PHP 5.5+ only and does not support PHP 5.3 any more. See https://github.com/wmde/purtle/pull/3" [integration/config] - 10https://gerrit.wikimedia.org/r/286617 (https://phabricator.wikimedia.org/T134162) (owner: 10Hashar) [11:01:07] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2259596 (10jayvdb) Debian Jessie 's pypy 2.4.0 would be great. I believe it is basically Python 2.7.8 with lots of very old bugs ;-)... [11:18:14] PROBLEM - Host integration-dev is DOWN: CRITICAL - Host Unreachable (10.68.17.81) [11:22:01] (03CR) 10Hashar: "Thanks Thiemo !" [integration/config] - 10https://gerrit.wikimedia.org/r/286617 (https://phabricator.wikimedia.org/T134162) (owner: 10Hashar) [12:04:47] Project beta-code-update-eqiad build #102953: 04FAILURE in 1 min 46 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102953/ [12:05:05] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2259700 (10hashar) a:03hashar [12:05:18] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2259701 (10hashar) [12:05:20] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2170705 (10hashar) [12:09:54] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2259710 (10Luke081515) a:03hashar [12:10:02] 06Release-Engineering-Team, 05Release: 1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2259711 (10Luke081515) [12:10:18] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2170735 (10Luke081515) [12:10:59] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2170735 (10Luke081515) a:05hashar>03None Wrong blocker, sry. [12:14:44] Project beta-code-update-eqiad build #102954: 04STILL FAILING in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102954/ [12:15:01] hashar: Maybe you can take a look at deployment-tin later? I only get: [12:15:05] luke081515@bastion-01:~$ ssh deployment-tin [12:15:05] Connection closed by 10.68.17.240 [12:22:15] Luke081515: :( [12:22:20] Luke081515: works for me [12:22:50] hashar: Now for me too [12:22:51] strange [12:23:07] pam_access(sshd:account): access denied for user `luke081515' from `bastion-01.bastion.eqiad.wmflabs' [12:23:18] Failed publickey for luke081515 from 10.68.17.232 port 50115 ssh2: RSA df:e5:ff:81:79:0f:a0:be:ac:60:fd:aa:c6:07:67:9b [12:23:18] fatal: Access denied for user luke081515 by PAM account configuration [preauth] [12:23:21] wrong key/username ? [12:23:39] I didn't change my settings... [12:23:40] Starting session: shell on pts/12 for luke081515 [12:23:42] looks good now :-} [12:23:47] ok :) [12:24:00] maybe the bastion session didn't like me :D [12:24:00] Regarding branching from 1.27 to 1.28.0-wmf.X: Anybody knows if the @ReleaseTaggerBot configuration has been updated? I remember six months ago we had some "mis-taggings". (I've created https://phabricator.wikimedia.org/tag/MW-1.28-release-notes/ in the meantime for it.) [12:24:15] (a question in Phab context, obviously) [12:24:34] andre__: ostriches would know. He is going to lead 1.28 stuff [12:24:46] Project beta-code-update-eqiad build #102955: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102955/ [12:25:10] jenkins 2.1 released [12:25:59] oh for god sake cherry picks... [12:26:10] thanks [12:29:22] (03PS1) 10Hashar: make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) [12:30:15] (03PS2) 10Hashar: make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) [12:30:19] (03CR) 10Hashar: [C: 032] make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) (owner: 10Hashar) [12:30:35] (03PS3) 10Hashar: make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) [12:30:38] is there something going on with deployment-tin? [12:30:59] "Connection closed by UNKNOWN" [12:31:09] i can connect just fine to other beta nodes [12:32:36] hashar: ^ ? known? [12:34:27] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2259821 (10hashar) Add a conflicting cherry pick which is no more needed https://gerrit.wikimedia.org/r/#/c/274165/3 for T124356 [12:34:46] mobrovac: Luke081515 had the same issue [12:34:46] Project beta-code-update-eqiad build #102956: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102956/ [12:34:53] with sshd / pam rejecting on preauth [12:34:59] I am tempted to blame ldap [12:37:15] UNKNOWN: what are you messing up? [12:37:30] debug1: Server accepts key: pkalg ssh-rsa blen 277 [12:37:30] debug2: input_userauth_pk_ok: fp da:50:6e:f8:a5:ef:72:20:8f:83:50:2f:58:68:6d:fa [12:37:30] debug3: sign_and_send_pubkey: RSA da:50:6e:f8:a5:ef:72:20:8f:83:50:2f:58:68:6d:fa [12:37:30] debug1: key_parse_private2: missing begin marker [12:37:31] debug1: read PEM private key done: type RSA [12:37:32] Connection closed by UNKNOWN [12:37:52] "missing begin marker" sounds suspicious [12:38:17] I don't had unknown, I got: Connection closed by 10.68.17.240 [12:39:02] hashar: are you able to log in there? [12:39:09] fatal: Access denied for user hashar by PAM account configuration [preauth] [12:39:12] yeah via salt [12:39:15] ssh deployment-salt [12:39:20] salt -v 'deployment-tin*' cmd.run 'tail -n200 /var/log/auth.log' [12:39:52] salt sshd's ass off [12:40:02] there is a bunch of [12:40:03] deployment-tin nslcd[29393]: [d87724] error writing to client: Broken pipe [12:44:50] Project beta-code-update-eqiad build #102957: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102957/ [12:54:46] Project beta-code-update-eqiad build #102958: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102958/ [13:04:48] Project beta-code-update-eqiad build #102959: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102959/ [13:05:07] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2259875 (10hashar) Add a conflicting cherry pick which is no more needed https://gerrit.wikimedia.org/r/#/c/274165/3 for T124356 Applied all local patches. Dropped one... [13:14:48] Project beta-code-update-eqiad build #102960: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102960/ [13:23:24] !log deployment-tin force upgraded HHVM from 3.6 to 3.12 [13:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:25:36] Project beta-code-update-eqiad build #102961: 04STILL FAILING in 2 min 35 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102961/ [13:29:10] !log beta: got rid of a leftover Wikidata/Wikibase patch that broke scap salt -v 'deployment-tin*' cmd.run 'sudo -u jenkins-deploy git -C /srv/mediawiki-staging/php-master/extensions/Wikidata/ checkout -- extensions/Wikibase/lib/maintenance/populateSitesTable.php' [13:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:29:53] (03PS1) 10Aude: Update Wikidata branch to wmf/1.27.0-wmf.23 [tools/release] - 10https://gerrit.wikimedia.org/r/286648 [13:30:22] !log Restarted nslcd on deployment-tin , pam was refusing authentication for some reason [13:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:30:28] Yippee, build fixed! [13:30:29] Project beta-code-update-eqiad build #102962: 09FIXED in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/102962/ [13:30:30] Luke081515: mobrovac: maybe deployment-tin works now [13:31:40] hashar: yep, I can connect via putty and WinSCP now [13:31:47] (03CR) 10Hashar: [C: 032] Update Wikidata branch to wmf/1.27.0-wmf.23 [tools/release] - 10https://gerrit.wikimedia.org/r/286648 (owner: 10Aude) [13:39:09] (03Merged) 10jenkins-bot: Update Wikidata branch to wmf/1.27.0-wmf.23 [tools/release] - 10https://gerrit.wikimedia.org/r/286648 (owner: 10Aude) [13:54:18] aude: are you bumping the branch in mediawiki/core 1.27.0-wmf.23 ? [13:54:23] err [13:54:24] i mean [13:54:34] are you bumping the Wikidata submodule ... ? [13:58:53] yeah, soon as jenkins approves [14:00:29] hashar: hm, still getting "Connection closed by UNKNOWN" for deployment-tin :/ [14:01:39] :( [14:01:43] no idea what it can be [14:06:53] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2260115 (10hashar) Not much to worry about so far ;-) @aude going to bump Wikibase [14:07:07] aude: poke https://phabricator.wikimedia.org/T131557 as needed -:) [14:19:16] !log beta: added unattended upgrade to Hiera::deployment-prep [14:19:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:20:42] hashar: https://gerrit.wikimedia.org/r/#/c/286657/ :) [14:21:29] aude: neat +2 ed :-) [14:21:54] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2260137 (10hashar) Wikidata bumped to .23 with https://gerrit.wikimedia.org/r/#/c/286657/ [14:22:41] thanks [14:22:58] mobrovac: looks like I have downgraded cassandra on deployment-restbase01 :( [14:25:30] !log beta salt -v '*' pkg.upgrade [14:25:33] cause yeah .. [14:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:27:24] aude: bah zend55 fails :( [14:27:32] and I forgot to check it [14:28:02] ah [14:28:33] PHP Fatal error: Call to a member function getId() on a non-object in /mnt/jenkins-workspace/workspace/mediawiki-extensions-php55/src/extensions/Thanks/tests/ApiRevThankIntegrationTest.php on line 67 [14:28:34] :( [14:29:23] that is in Thanks apparnetly [14:32:56] PROBLEM - Puppet run on deployment-salt is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:39:51] aude: flappy test apparently [14:41:06] :/ [14:42:44] (03PS1) 10JanZerebecki: Add Purtle [integration/docroot] - 10https://gerrit.wikimedia.org/r/286659 (https://phabricator.wikimedia.org/T134162) [14:43:12] (03CR) 10JanZerebecki: [C: 032] Add Purtle [integration/docroot] - 10https://gerrit.wikimedia.org/r/286659 (https://phabricator.wikimedia.org/T134162) (owner: 10JanZerebecki) [14:43:32] hashar: would rebootinh deployment-tin be an option [14:43:41] try? [14:43:41] ? [14:43:49] 10Continuous-Integration-Config, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint-2016-04-26: Set up Jenkins for Purtle repository - https://phabricator.wikimedia.org/T134162#2260215 (10JanZerebecki) The code coverage works and is now published at https://integration.wikimedia.org/cover/purtle/ . The auto... [14:44:06] will need to rearm the Keyholder [14:44:39] you know how to do it hashar? [14:44:48] nop [14:44:49] rearming the keyholder [14:44:52] fun [14:44:58] but wikitech has some doc [14:45:16] should just be keyholder arm /me says with no context :) [14:45:26] hehehe [14:45:49] hashar: mind tailing /var/log/auth while i try to connect so as to get to the bottom of this? [14:46:28] mobrovac: ssh deployment-salt.deployment-prep.eqiad.wmflabs sudo salt -v 'deployment-tin*' cmd.run 'tail -n200 /var/log/auth' [14:46:30] ;-) [14:47:55] RECOVERY - Puppet run on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [14:49:10] (03Merged) 10jenkins-bot: Add Purtle [integration/docroot] - 10https://gerrit.wikimedia.org/r/286659 (https://phabricator.wikimedia.org/T134162) (owner: 10JanZerebecki) [14:49:17] no logs are appended while i try to log in there [14:49:23] i'm rebooting it [14:49:30] !log deployment-tin rebooting it [14:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:49:56] wth? [14:50:00] now i can log in [14:50:09] tried one last time before actual reboot [14:50:28] flakiness of the infra at its best [14:51:48] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2260222 (10hashar) Synced Wikidata .23 sync-dir php-1.27.0-wmf.23/extensions/Wikidata 'wikidata to .23 https://gerrit.wikimedia.org/r/#/c/286657/' [14:52:19] hashar: so have you degraded cass definitely ? [14:52:37] mobrovac: ??? [14:52:59] hashar: (16:22:58) hashar: mobrovac: looks like I have downgraded cassandra on deployment-restbase01 :( [14:53:03] oh yeah [14:53:09] did an apt-get upgrade [14:53:09] :( [14:53:19] *sigh* [14:53:23] why the hell? [14:53:28] cassandra 2.1.13 [14:53:29] ??????? [14:53:30] :( [14:53:31] bad copy paste [14:53:36] uf ok [14:54:42] with restbase02 being at 2.2.6 [15:00:03] 10releng-201516-q2: [keyresult] Deprecate gitblit in favor of Diffusion - https://phabricator.wikimedia.org/T111465#2260255 (10demon) [15:00:05] 05Gitblit-Deprecate, 06Release-Engineering-Team, 10Diffusion, 07WorkType-NewFunctionality: Use Diffusion as canonical location for browsing code repos (not gitblit) - https://phabricator.wikimedia.org/T752#2260254 (10demon) 05Open>03Resolved [15:00:33] yoyo, anybody know much about grafana.wmflabs.org? [15:00:38] hashar, jzerebecki: do we have ci meeting now? [15:00:51] jzerebecki is busy i think [15:00:53] dan out still [15:00:57] i'd like to log in, and i'd like to see some of my stats there...do they go there if i emit to statsd at labmon1001.eqiad.wmne? [15:01:23] hashar: ok, just saw in calendar that jzerebecki is not coming [15:01:23] ottomata: grafana.wmflabs.org is a random test instance afaik [15:01:33] hashar: canceling the meeting then? [15:01:42] ottomata: emit to labmon1001.eqiad.wmnet and in grafana-admin.wikimedia.org it is available as a data source [15:01:47] zeljkof: yeah [15:01:59] zeljkof: both of us can do a quick chat if you want [15:02:25] hashar: nothing to report :| still working on the migration [15:02:28] ah perfect, and I can log in! [15:02:30] thank you hashar [15:02:33] yea need to concentrate on a few other things [15:02:43] jzerebecki: no worries ;) [15:02:57] oh regular grafana..huh [15:02:58] ok cool [15:08:41] away, will be back in 50 minutes for puppet swat [16:09:13] 10Beta-Cluster-Infrastructure: puppet failure on deployment-phab due to missing packages php5-mailparse and python-phabricator - https://phabricator.wikimedia.org/T134277#2260577 (10Krenair) [16:21:29] (03CR) 10Hashar: make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) (owner: 10Hashar) [16:21:34] (03CR) 10Hashar: [C: 032] make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) (owner: 10Hashar) [16:22:30] (03Merged) 10jenkins-bot: make-wmf-branch: drop mw cherry pick [tools/release] - 10https://gerrit.wikimedia.org/r/286635 (https://phabricator.wikimedia.org/T124356) (owner: 10Hashar) [16:30:00] Yippee, build fixed! [16:30:00] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #9: 09FIXED in 46 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/9/ [16:30:04] Yippee, build fixed! [16:30:04] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #9: 09FIXED in 50 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/9/ [16:46:17] !log Refreshing Nodepool Jessie image to have it include pypy | T134235 poke @jayvdb [16:46:18] T134235: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235 [16:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:49:28] !log Notice: /Stage[main]/Contint::Packages::Python/Package[pypy]/ensure: ensure changed 'purged' to 'present' | T134235 [16:49:29] T134235: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235 [16:49:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:54:06] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2260790 (10hashar) Added on the Jessie images: ``` Python 2.7.8 (2.4.0+dfsg-3, Dec 20 2014, 13:30:46) [PyPy 2.4.0 with GCC 4.9.2] ```... [16:59:39] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2260831 (10hashar) 05Open>03Resolved tox-jessie managed to run flake8 under pypy ! ( https://integration.wikimedia.org/ci/job/tox-j... [17:00:37] PROBLEM - Puppet staleness on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [17:01:55] Yippee, build fixed! [17:01:55] Project selenium-MultimediaViewer-286674 » internet_explorer 9.0,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #8: 09FIXED in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer-286674/BROWSER=internet_explorer%209.0,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/8/ [17:08:42] Yippee, build fixed! [17:08:43] Project selenium-Flow-master » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #8: 09FIXED in 23 min: https://integration.wikimedia.org/ci/job/selenium-Flow-master/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/8/ [17:09:16] Project selenium-MultimediaViewer-286674 » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #8: 04FAILURE in 27 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer-286674/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/8/ [17:13:59] 10Deployment-Systems, 10scap, 06Mobile-Apps, 03Mobile-Content-Service, 03Scap3 (Scap3-Adoption-Phase1): Deploy mobileapps/deploy with scap3 - https://phabricator.wikimedia.org/T129147#2260890 (10bearND) [17:19:25] 10Deployment-Systems, 10scap, 06Mobile-Apps, 03Mobile-Content-Service, 03Scap3 (Scap3-Adoption-Phase1): Deploy mobileapps/deploy with scap3 - https://phabricator.wikimedia.org/T129147#2260898 (10bearND) a:03bearND [17:21:52] 03releng-201516-q4, 03Scap3 (Scap3-Adoption-Phase1): [keyresult] Migrate remaining trebuchet deployed services - https://phabricator.wikimedia.org/T129290#2260903 (10bearND) [17:22:28] 10Deployment-Systems, 10scap, 06Mobile-Apps, 03Mobile-Content-Service, 03Scap3 (Scap3-Adoption-Phase1): Deploy mobileapps/deploy with scap3 - https://phabricator.wikimedia.org/T129147#2260907 (10bearND) [17:29:40] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:03] hashar, ostriches: is https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085334.html still actual? I thought we begin with 1.28.0-wmf.1 today? But https://wikitech.wikimedia.org/wiki/Deployments says something else :/ [19:34:27] FlorianSW: hello ! [19:34:36] yeah, sorry: Hi :) [19:34:46] Email is dated [19:34:47] FlorianSW: yeah got some confusion, I went with 1.27.0-wmf.23 since that was on deployments [19:34:55] and yesterday ostriches said 1.28 will be later [19:35:12] I'll do that branching and update after lunch [19:35:14] meanwhile, I have deleted some 1.28 page on mw.org, dates were for this week. [19:36:45] ok, so wmf.23 this week and wmf.1 (of 1.28) next week, right? :) [19:37:37] Yep yep [19:37:51] thanks for the quick info ostriches and hashar :) [19:38:27] FlorianSW: ohh ostriches is the release boss. I dont even know which mw versions we still support! [19:39:49] hashar: I can recommend: https://www.mediawiki.org/wiki/Version_lifecycle for that :P (At least myself and some others, I think ostreiches, too) try to keep all the version pages updated (that's why I always nerve you :P) [19:44:50] well [19:44:51] MediaWiki 1.27 (alpha; git master) [19:44:55] sounds wrong? [19:45:04] havent we released 1.27 yet ? [19:46:37] Not that I know, as the release is scheduled for the end of May [19:46:42] hashar or ostriches will 1.28 be branched today or next week and will 1.27 be branched today. [19:47:35] I'll do that branching and update after lunch [19:47:43] Ok, thanks [19:56:35] 03Scap3, 06Labs: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2261463 (10mmodell) p:05Triage>03Normal I'm going to figure out how to manage this from [[ /diffusion/OPUP/browse/production/modules/scap/manifests/target.pp | scap::target ]] [19:56:41] 03Scap3, 06Labs: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2261466 (10mmodell) a:03mmodell [20:02:17] !log Restarting Jenkins [20:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:31:51] is greg-g back? I want to deploy the UploadsLink extension to Commons tomorrow (T130018) but not sure who has to signoff on it if anyone... ostriches? [20:31:51] T130018: Review and deploy Extension:UploadsLink to Wikimedia Commons - https://phabricator.wikimedia.org/T130018 [20:33:08] legoktm: if you've got a window sure [20:35:27] ostriches: ok, I'll create one right after tomorrow morning's SWAT [20:48:26] Yippee, build fixed! [20:48:27] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #1004: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/1004/ [21:05:55] PROBLEM - SSH on deployment-tin is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:07:35] PROBLEM - Puppet run on deployment-jobrunner01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:17:58] RECOVERY - Puppet run on deployment-parsoid06 is OK: OK: Less than 1.00% above the threshold [0.0] [21:21:47] 06Release-Engineering-Team, 06Project-Admins: SWAT Project (Tag) - https://phabricator.wikimedia.org/T99411#2261763 (10Danny_B) [21:35:02] 10Continuous-Integration-Infrastructure, 10Differential, 10Phabricator, 06Project-Admins: Need to create #meta-* projects without formally discussing each one first. - https://phabricator.wikimedia.org/T132888#2261806 (10Luke081515) [21:35:56] 10Deployment-Systems, 06Project-Admins, 15User-greg: Further cleanup of #Deployment-Systems - https://phabricator.wikimedia.org/T126631#2261811 (10Luke081515) [21:40:25] 10Deployment-Systems, 06Project-Admins, 15User-greg: Further cleanup of #Deployment-Systems - https://phabricator.wikimedia.org/T126631#2261830 (10Danny_B) [21:45:49] 10Continuous-Integration-Infrastructure, 10Differential, 10Phabricator, 06Project-Admins: Need to create #meta-* projects without formally discussing each one first. - https://phabricator.wikimedia.org/T132888#2261869 (10mmodell) 05Open>03Resolved a:03mmodell I updated the [[ https://www.mediawiki.or... [21:57:27] Hi! Is there any recommended practice for what to do if we'd like to see a warning if a particular error happens client-side, or at least if it starts happening a lot? [22:02:18] ostriches: ^ ? :) [22:02:34] twentyafterfour: ^ ? :) [22:03:13] By "we" I mean, getting a log message somewheres on our servers, not just on the client [22:04:14] AndyRussG: hmm... I think there is a way to collect client side errors via logstash [22:04:20] but I'm not sure what that is [22:04:37] I mean I think we already have something set up for that [22:04:40] Not worth it to do EventLogging especially for this, I think... It's an edge case that may happen every now and again (until it's fixed) and users are quite likely not to notic it [22:04:52] oh [22:05:39] I'm not sure of a lighter weight way [22:05:53] Hmm [22:06:17] I mean for now I can just throw a message in the console via mw.log() [22:07:16] If there's not a standard way I'd just do that... [22:07:59] I guess we don't collect a sample of those or anything anywhere (/me throws privacy concerns out 10th floor window) [22:08:55] twentyafterfour: K thx much in any case! :) [22:09:23] yeah I don't know of anything, sorry [22:17:54] twentyafterfour: how does that change relate to this config I wonder [22:17:56] modules/beta/manifests/deployaccess.pp: security::access::config { 'beta-allow-mwdeploy': [22:17:56] modules/beta/manifests/deployaccess.pp- content => "+ : deploy-service mwdeploy : ${bastion_ip}\n", [22:17:56] modules/beta/manifests/deployaccess.pp- priority => 50, [22:18:16] seems there is already an exception defined in this way but maybe we changed how we do business w/ scap and it's now invalid? [22:18:59] chasemp: the new code will create that exception dynamically (one for each unique scap::target user) [22:19:25] do we still need the old code? [22:20:19] I don't think mwdeploy has a scap::target defined [22:20:26] but deploy-service does [22:21:02] ideally all the old stuff gets replaced by scap::target but it's a transitional period right now [22:21:02] I don't know how scap works really [22:21:06] ok [22:21:07] that's what I figured [22:21:21] scap::target is the shiny new thing that will replace a whole ton of config in puppet [22:22:25] I've been working really hard to get https://gerrit.wikimedia.org/r/#/c/284418/ and https://gerrit.wikimedia.org/r/#/c/285519/ merged as well, those will really clean things up a bunch more [22:23:25] shudder, if $::realm == labs works and so does if $::realm == 'labs' [22:23:33] ah interesting, yeah I haven't had time to follow [22:23:36] so the new way to deploy something is just to define a scap::source for each repo, a keyholder::agent for each user-group on deployment hosts and a scap::target for each service that gets deployed [22:24:54] twentyafterfour: are you going to have some time to let 286754 rollout and test a bit in beta? [22:25:25] chasemp: yeah [22:25:39] I can cherry pick it in beta if need-be but it only affects beta anyway I think [22:25:50] nah seems good to me, merging [22:26:01] thanks for working through it [22:27:14] chasemp: thanks for reviewing it. that will hopefully save some future deployers from banging their head on a wall trying to set up a new scap deployment or port one from trebuchet [22:30:21] 03Scap3, 06Labs, 13Patch-For-Review: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2262064 (10mmodell) [22:38:51] hi, is betacluster still auto-syncing? i can't tell if i'm getting the latest graph ext patch. https://gerrit.wikimedia.org/r/#/c/286474/ [22:39:06] seems like it is older [22:39:13] yurik: it should be working [22:39:24] we get alerted if the job fails [22:39:47] twentyafterfour, which server should i check? [22:40:11] deployment-mw* [22:41:43] twentyafterfour, sorry, could you remind me the path there? [22:42:03] is it the same as on prod depl? [22:43:51] twentyafterfour, it doesn't seem to be up to date. I checked /srv/mediawiki/php-master/extensions/Graph/extension.json and it doesn't contain https://gerrit.wikimedia.org/r/#/c/286474/3/extension.json [22:46:11] hmm, the build isn't failing, it just isn't triggering? [22:46:18] https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [22:47:02] https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ [22:47:16] "(pending—deployment-tin.eqiad is offline) " [22:51:29] The instance is active but it's frozen [22:54:50] hmmm.. the jenkins agents usually spawn really quickly [22:56:25] is deployment-tin sick generally? I'm not able to ssh in eigther [22:56:27] *either [23:00:26] !log Jenkins agent on deployment-tin not spawning; investigating [23:00:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:01:16] bd808: tin is frozen [23:01:23] I wasn't able to get into the instance earlier [23:01:27] was thinking of restarting it [23:01:28] !log rebooting deployment-tin [23:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:01:47] Krenair: yeah I tried everything to get in, rebooting it seems necessary [23:01:56] PROBLEM - Puppet run on deployment-salt is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:05:47] RECOVERY - SSH on deployment-tin is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [23:05:48] does anybody know why apt-get would not use package from jesse-backports with later version? is there some setting that needs to be turned on? [23:08:00] SMalyshev: /etc/apt/preferences.d/wikimedia.pref has packages pinned to origin=wikimedia [23:08:37] deployment-tin is back up but jenkins still doesn't seem to be able to spawn the agent [23:09:30] twentyafterfour: hmm... it does install one from jessie-main but not from jessie-backports... how can I check/change it? [23:09:39] twentyafterfour: and can I tell puppet to do it? [23:12:58] I see a burst of nslcd errors in syslog right after I tell jenkins to spawn the agent. [23:13:23] I think that happens when the ldap groups have too many members to fetch [23:14:05] jenkins agent wont start because of a failed ldap query? [23:14:14] not sure yet [23:14:39] the agent is spawned via ssh so it is possible [23:15:44] naw I think the timing the first couple of times was a coincidence [23:16:12] jenkins ui isn't showing anything useful and neither is syslog on deployment-tin [23:16:58] RECOVERY - Puppet run on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [23:20:33] fixed! [23:21:13] !log Changed "Maximum Number of Retries" for ssh agent launch in jenkins for deployment-tin from "0" to "10" [23:21:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:23:48] yurik: update jobs are running again for beta. I see a version bump for Graph in the git fetch that just happened. The scap is starting now. [23:24:04] awesome, thanks bd808 ! [23:24:22] 03Scap3, 06Labs, 13Patch-For-Review: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2262088 (10chasemp) Close? [23:24:41] thanks for noticing that it was borked [23:26:24] bd808: nice work [23:26:41] random button clicking FTW ;) [23:27:03] 03Scap3, 06Labs, 13Patch-For-Review: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2262091 (10mmodell) seems to be working. deployment-tin crashed and burned right around the same time as this patch merged but it seems to be unrelated. [23:27:10] 03Scap3, 06Labs, 13Patch-For-Review: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#2262092 (10mmodell) 05Open>03Resolved [23:27:23] 03Scap3, 06Labs, 13Patch-For-Review: ssh as system users not allowed in labs - https://phabricator.wikimedia.org/T121721#1886283 (10mmodell) Thanks @chasemp