[00:04:05] 10Continuous-Integration-Config, 10Wikimedia-Fundraising-CiviCRM: Civi CI is using contaminated database dumps - https://phabricator.wikimedia.org/T113559#1668974 (10awight) 3NEW [00:08:04] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Civi CI is using contaminated database dumps - https://phabricator.wikimedia.org/T113559#1669000 (10awight) [00:20:01] Project beta-update-databases-eqiad build #3141: FAILURE in 0.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/3141/ [00:20:43] 6Release-Engineering-Team: Gerrit clean up day - releng - https://phabricator.wikimedia.org/T113028#1669081 (10greg) As of right now (5:20pm Pacific): * [[ https://gerrit.wikimedia.org/r/#/q/status%3Aopen+%28project%3A%5Eintegration%2F.%2A+OR+project%3A%5Emediawiki%2Ftools%2Freleng+OR+project%3A%5Emediawiki%2Fto... [00:21:52] oops [00:25:31] Yippee, build fixed! [00:25:32] Project beta-scap-eqiad build #71306: FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/71306/ [00:26:41] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-MultimediaViewer: MultimediaViewer thumbnailBeforeProduceHTML hook breaks other extensions parser tests - https://phabricator.wikimedia.org/T69302#1669111 (10Jdlrobson) Is this still breaking things? [00:26:58] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-MultimediaViewer: MultimediaViewer thumbnailBeforeProduceHTML hook breaks other extensions parser tests - https://phabricator.wikimedia.org/T69302#1669113 (10Jdlrobson) Is it still a high priority? [00:34:52] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1669147 (10yuvipanda) Good Job Lego (As he asked me to type) - the redirects are checked now. There is lots of other apache... [00:43:36] 6Release-Engineering-Team, 15User-greg: Gerrit clean up day - releng - https://phabricator.wikimedia.org/T113028#1669162 (10greg) 5Open>3Resolved a:3greg [00:44:05] 6Release-Engineering-Team: Gerrit clean up day - releng - https://phabricator.wikimedia.org/T113028#1669165 (10greg) a:5greg>3None [01:03:35] 10Continuous-Integration-Infrastructure, 6Editing-Department, 3Reading-Web, 5Release-Engineering-Epics, and 2 others: [EPIC] Wikimedia should use a standard set of tools for managing code quality - https://phabricator.wikimedia.org/T111396#1669206 (10greg) [01:04:34] 6Release-Engineering-Team, 7Epic, 5Release-Engineering-Epics, 7Tracking: [EPIC} Provide pre-merge reports on patchsets (tracking) - https://phabricator.wikimedia.org/T101542#1669207 (10greg) [01:04:37] 6Release-Engineering-Team, 10Gather, 10MobileFrontend, 10Reading Web Planning, and 2 others: [EPIC] Create a formal release process for MobileFrontend/Gather - https://phabricator.wikimedia.org/T100296#1669208 (10greg) [01:04:39] 6Release-Engineering-Team, 7Epic, 5Release-Engineering-Epics: [EPIC] Encourage developers to increase code coverage - https://phabricator.wikimedia.org/T100294#1669209 (10greg) [01:04:41] 10Deployment-Systems, 6Release-Engineering-Team, 7Epic, 5Release-Engineering-Epics: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1669210 (10greg) [01:04:43] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 7Epic, and 2 others: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#1669212 (10greg) [01:04:45] 10Deployment-Systems, 6Release-Engineering-Team, 10ReleaseTaggerBot, 7Epic, 5Release-Engineering-Epics: EPIC: Code Deploy Dashboard - https://phabricator.wikimedia.org/T280#1669211 (10greg) [01:06:22] 10Deployment-Systems, 7Epic, 5Release-Engineering-Epics: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1168219 (10greg) [01:06:24] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 7Epic, 5Release-Engineering-Epics, 7Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#541910 (10greg) [01:09:07] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1669220 (10Dzahn) tested it. just changed redirects.conf, but not .dat https://gerrit.wikimedia.org/r/#/c/240626/1 Assert... [01:11:14] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1669223 (10Dzahn) 5stalled>3Resolved a:3Dzahn thank you @Legoktm @Yuvipanda maybe there should be more checks, yea.... [01:11:36] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1669227 (10Dzahn) a:5Dzahn>3Legoktm [01:13:35] 10Continuous-Integration-Infrastructure, 6Editing-Department, 3Reading-Web, 5Release-Engineering-Epics, and 2 others: [EPIC] Wikimedia should use a standard set of tools for managing code quality - https://phabricator.wikimedia.org/T111396#1603051 (10greg) [01:18:57] 10Continuous-Integration-Config, 6Release-Engineering-Team, 7HHVM: Jenkins: Implement hhvm based voting jobs for mediawiki and extensions (tracking) - https://phabricator.wikimedia.org/T75521#1669252 (10greg) [01:20:00] 10Continuous-Integration-Config, 7HHVM, 5Release-Engineering-Epics: Jenkins: Implement hhvm based voting jobs for mediawiki and extensions (tracking) - https://phabricator.wikimedia.org/T75521#1669265 (10greg) a:5hashar>3None [01:20:59] Yippee, build fixed! [01:20:59] Project beta-update-databases-eqiad build #3142: FIXED in 58 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/3142/ [01:23:50] 10Gerrit-Migration, 10Gitblit-Deprecate: Update {{git file}} to link to diffusion - https://phabricator.wikimedia.org/T101358#1669270 (10greg) a:5Nemo_bis>3None [01:25:42] 10Deployment-Systems: Trebuchet should repack / pack-refs git repos under /srv/deployment - https://phabricator.wikimedia.org/T112509#1669277 (10greg) [01:26:23] 10Browser-Tests, 6Release-Engineering-Team, 7Tracking: Move browser test alerts to responsible teams' channels from -releng (tracking) - https://phabricator.wikimedia.org/T89375#1669280 (10greg) p:5High>3Normal [01:26:38] 6Release-Engineering-Team: Design a Test-Driven Development (TDD) survey - https://phabricator.wikimedia.org/T94472#1669283 (10greg) p:5Triage>3Low [01:26:53] 10Deployment-Systems: scap shouldn't log completion (it should log fail!) - https://phabricator.wikimedia.org/T110793#1669285 (10greg) [01:27:11] 10Deployment-Systems: Don't continue scap if sync to all proxies failed - https://phabricator.wikimedia.org/T110791#1669287 (10greg) [01:27:20] 6Release-Engineering-Team: Repositories dashboard - https://phabricator.wikimedia.org/T112259#1669290 (10greg) p:5Triage>3Normal [01:28:26] 6Release-Engineering-Team, 5Testing Initiative 2015: Guides for initializing a test suite: unit testing & browser testing - https://phabricator.wikimedia.org/T108107#1669294 (10greg) p:5Triage>3Normal [01:29:14] 6Release-Engineering-Team, 5Testing Initiative 2015: Include links to unit testing (Emphasize testing documentation on mediawiki.org) - https://phabricator.wikimedia.org/T108105#1669298 (10greg) p:5Triage>3Normal [01:43:29] 6Release-Engineering-Team, 10MediaWiki-General-or-Unknown: Remove EOL MediaWiki release branches - https://phabricator.wikimedia.org/T92503#1669328 (10greg) [01:47:20] 6Release-Engineering-Team: Investigate production and/or beta requirements for Sentry - https://phabricator.wikimedia.org/T89732#1669337 (10greg) Nope! :) [01:49:30] 6Release-Engineering-Team, 10Wikimedia-Apache-configuration, 6operations: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#1669349 (10greg) [01:49:34] 10Deployment-Systems, 6Release-Engineering-Team, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1669348 (10greg) [01:49:44] 10Deployment-Systems, 6Performance-Team, 6operations, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414314 (10greg) [01:51:15] 10Gitblit-Deprecate, 6Release-Engineering-Team, 10Diffusion: Wikimedia code repository browser in Phabricator - https://phabricator.wikimedia.org/T752#1669356 (10greg) [01:51:59] 3releng-201516-q2: [keyresult] Deprecate gitblit in favor of Diffusion - https://phabricator.wikimedia.org/T111465#1669358 (10greg) [01:52:33] 3releng-201516-q2: [keyresult] Deprecate gitblit in favor of Diffusion - https://phabricator.wikimedia.org/T111465#1604788 (10greg) [01:52:34] 10Gitblit-Deprecate, 6Release-Engineering-Team, 10Diffusion: Wikimedia code repository browser in Phabricator - https://phabricator.wikimedia.org/T752#12517 (10greg) [03:18:22] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<33.33%) [03:22:27] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #828: FAILURE in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/828/ [05:17:17] PROBLEM - Puppet failure on deployment-cache-parsoid04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:17:17] PROBLEM - Puppet failure on deployment-cache-mobile04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:36:57] Yippee, build fixed! [05:36:57] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #551: FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/551/ [05:48:59] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #202: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/202/ [05:58:06] 10Continuous-Integration-Infrastructure, 7Documentation: Document RuboCop workflow - https://phabricator.wikimedia.org/T1368#1669657 (10greg) [06:06:57] 10Beta-Cluster: Process accounting routinely fill up /var on deployment-bastion - https://phabricator.wikimedia.org/T91354#1669675 (10greg) [06:07:03] 10Beta-Cluster: Process accounting routinely fill up /var on deployment-bastion - https://phabricator.wikimedia.org/T91354#1669678 (10greg) ``` 03:18 < shinken-w> PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.b... [06:09:05] !log deployment-bastion is getting close to having a filled up /var again: https://phabricator.wikimedia.org/T91354 [06:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:12:46] 6Release-Engineering-Team, 15User-greg: Add shinken output for Beta Cluster to -operations channel - https://phabricator.wikimedia.org/T1334#1669685 (10greg) 5Open>3declined a:3greg Maybe later. [06:12:56] 6Release-Engineering-Team: Add shinken output for Beta Cluster to -operations channel - https://phabricator.wikimedia.org/T1334#1669689 (10greg) a:5greg>3None [06:15:03] 10Deployment-Systems: Teach make-wmf-branch how to convert obsolete branches to tags - https://phabricator.wikimedia.org/T113572#1669691 (10greg) 3NEW [06:23:29] 10Deployment-Systems, 6Release-Engineering-Team, 6Performance-Team, 6operations, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1669707 (10greg) The plan of action in this task is good, but can we get subtasks for the things tha... [06:27:10] 10Browser-Tests: Provide support for spoffing the physical geolocation on QA tests - https://phabricator.wikimedia.org/T60720#1669717 (10greg) [06:27:49] 6Release-Engineering-Team: True code pipeline - https://phabricator.wikimedia.org/T281#1669719 (10greg) 5Open>3Invalid [06:28:37] 6Release-Engineering-Team, 10Staging, 10releng-201415-Q3: [Quarterly Success Metric] Green nightly builds on the staging cluster (tracking) - https://phabricator.wikimedia.org/T88701#1669723 (10greg) p:5Lowest>3Low [06:28:43] 6Release-Engineering-Team, 10Staging, 10releng-201415-Q3: [Quarterly Success Metric] Green nightly builds on the staging cluster (tracking) - https://phabricator.wikimedia.org/T88701#1018466 (10greg) p:5Low>3Lowest [06:35:08] 10Beta-Cluster, 10RESTBase, 6Services, 6operations: Firewall rules too restrictive on deployment-restbase0x.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T113528#1669735 (10MoritzMuehlenhoff) This is caused by the Hiera data used by the ferm rules: cassandra::seeds in prod uses hostname... [06:38:25] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [07:11:27] Yippee, build fixed! [07:11:27] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #579: FIXED in 46 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/579/ [07:41:29] Yippee, build fixed! [07:41:29] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce build #170: FIXED in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce/170/ [08:22:36] (03PS2) 10Florianschmidtwelzow: Update jenkins tests for TopTenPages [integration/config] - 10https://gerrit.wikimedia.org/r/240425 (owner: 10Paladox) [08:28:56] 10Continuous-Integration-Infrastructure, 7WorkType-Maintenance: New Trusty slaves can't run the mediawiki qunit jobs - https://phabricator.wikimedia.org/T113489#1669871 (10hashar) 5Resolved>3Open mediawiki-core-qunit is fixed. mediawiki-extensions-qunit still fails for some reason on trusty-1014 and trust... [08:33:07] (03PS2) 10Zfilipin: Switch ZeroBanner browsertests to Chrome [integration/config] - 10https://gerrit.wikimedia.org/r/239913 (https://phabricator.wikimedia.org/T113280) (owner: 10Hashar) [08:34:01] (03PS3) 10Hashar: Update jenkins tests for TopTenPages [integration/config] - 10https://gerrit.wikimedia.org/r/240425 (owner: 10Paladox) [08:34:16] (03CR) 10Hashar: [C: 032] Update jenkins tests for TopTenPages [integration/config] - 10https://gerrit.wikimedia.org/r/240425 (owner: 10Paladox) [08:34:49] (03Merged) 10jenkins-bot: Update jenkins tests for TopTenPages [integration/config] - 10https://gerrit.wikimedia.org/r/240425 (owner: 10Paladox) [08:34:53] (03Abandoned) 10Zfilipin: Switch ZeroBanner browsertests to Chrome [integration/config] - 10https://gerrit.wikimedia.org/r/239913 (https://phabricator.wikimedia.org/T113280) (owner: 10Hashar) [08:39:29] (03PS1) 10Zfilipin: Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) [08:46:29] 10Continuous-Integration-Infrastructure, 7WorkType-Maintenance: New Trusty slaves can't run the mediawiki qunit jobs - https://phabricator.wikimedia.org/T113489#1669894 (10hashar) Gave it a try again with a fresh workspace and it still fail :-( [08:47:14] 10Beta-Cluster, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1669896 (10Chmarkine) [[ https://letsencrypt.org/ | Let's Encrypt ]] provides free trusted(*) DV non-wildcard certs. We have 31 domains lists [[... [08:47:44] 10Continuous-Integration-Infrastructure, 7WorkType-Maintenance: New Trusty slaves can't run the mediawiki qunit jobs - https://phabricator.wikimedia.org/T113489#1669897 (10hashar) I have depooled https://integration.wikimedia.org/ci/computer/integration-slave-trusty-1014/ Removed labels from https://integrati... [08:49:13] Was the idea of logging client-side errors and warnings ever discussed? [08:49:21] *in production [08:50:41] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [43200.0] [08:50:51] Adrian_WMDE: yes! there is even a RFC for it https://www.mediawiki.org/wiki/Requests_for_comment/Server-side_Javascript_error_logging [08:51:09] Adrian_WMDE: https://phabricator.wikimedia.org/T382 [08:51:35] Adrian_WMDE: seems stalled [08:52:24] thanks [08:53:27] 10Browser-Tests, 5Patch-For-Review: Delete broken ZeroBanner browsertests Jenkins job - https://phabricator.wikimedia.org/T113463#1669899 (10zeljkofilipin) [09:02:01] (03PS2) 10Zfilipin: Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) [09:03:06] (03PS3) 10Zfilipin: Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) [09:17:49] 10Continuous-Integration-Infrastructure, 7WorkType-Maintenance: New Trusty slaves can't run the mediawiki qunit jobs - https://phabricator.wikimedia.org/T113489#1669954 (10hashar) Using resourceloader debug under Chromium, the canceled request is: ``` api.php?action=query &format=json &prop=pageimages%7Cinfo%7... [09:24:28] 10Continuous-Integration-Infrastructure, 7WorkType-Maintenance: New Trusty slaves can't run the mediawiki qunit jobs - https://phabricator.wikimedia.org/T113489#1669968 (10hashar) Hey @Krinkle , I have a hard time figuring out why Trusty slaves I have created yesterday are failing the mediawiki-core-qunit job... [09:24:44] there [09:24:53] I have lost a morning trying to debug a qunit failure :-((((((((((( [09:26:23] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #731: FAILURE in 1 hr 16 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/731/ [09:30:37] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:32:02] hashar: *waves*. So I've been playing with composer yesterday. The backport option is a PITA (even to jessie), but the fpm option was relatively quick: https://gerrit.wikimedia.org/r/#/c/240451/ [09:32:11] :((( [09:32:11] it doesn't really offer much of an advantage over the git option, though [09:32:39] and legoktm prefers that version, so we take the git option for tool labs as well... [09:32:51] using integration/config as a source ? [09:32:54] yeah [09:33:18] grr integration/composr [09:33:47] valhallasw`cloud: yeah feel free to reuse it :-} [09:34:09] valhallasw`cloud: if you can fill a task about the backporting and paste the errors /wall you have hit, that would be nice [09:34:18] maybe one day we will have time to handle the backport [09:34:23] if at all possible [09:34:39] hashar: basically all dependencies have to be backported, including ones already in jessie (version conflicts) [09:35:01] greaat [09:35:02] which is probably doable, but having to overwrite existing packages before being able to build composer made me unhappy [09:35:08] https://etherpad.wikimedia.org/p/74j8K2zIob has some notes [09:35:48] there's probably a cleaner way to build packages (without having to install them), but I don't know how to do it. [09:35:53] sometime dependencies can be changed though [09:35:54] with no impact [09:35:57] (sometime) [09:36:23] you can paste all of that to Phabricator for later reference [09:36:27] the other option is to somehow automate fpm and link it to aptly [09:36:29] ya, good point [09:36:38] what is aptly ? [09:36:47] the tool labs apt server [09:36:55] which has fun things like a REST endpoint [09:37:10] oh [09:38:12] basically, if we can automate git commit -> package rebuild -> package pushed to , I think it could work? [09:38:18] the fpm option, that is [09:39:31] valhallasw`cloud: I have a Jenkins job that is able to create packages [09:46:18] hashar: notes are now also under https://phabricator.wikimedia.org/T104789#1670054 [09:57:41] (03PS1) 10Hashar: nodepool: contint::packages::ops in images [integration/config] - 10https://gerrit.wikimedia.org/r/240660 [09:58:40] (03CR) 10Hashar: [C: 04-2] "Requires https://gerrit.wikimedia.org/r/#/c/240659/" [integration/config] - 10https://gerrit.wikimedia.org/r/240660 (owner: 10Hashar) [10:01:27] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670080 (10zeljkofilipin) [10:03:54] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670094 (10zeljkofilipin) [10:04:03] PROBLEM - Puppet failure on deployment-tmh01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:04:30] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1644704 (10zeljkofilipin) [10:05:54] 10Beta-Cluster, 10CirrusSearch, 6Discovery, 10pywikibot-core, and 3 others: Search test failed - https://phabricator.wikimedia.org/T113517#1670113 (10hashar) 5Open>3Resolved Seems http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&generator=search&gsrsearch=wiki works now. Thanks @Ebernhardson ! [10:10:11] 6Release-Engineering-Team: Repositories dashboard - https://phabricator.wikimedia.org/T112259#1670134 (10hashar) The Kunal code is now in integration/dashboard.git Written for python3.4 [10:11:49] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670141 (10zeljkofilipin) [10:15:12] (03PS2) 10Hashar: Don't run mediawiki-phpunit-*-composer on REL1_23 [integration/config] - 10https://gerrit.wikimedia.org/r/240447 (https://phabricator.wikimedia.org/T113506) (owner: 10JanZerebecki) [10:15:26] 10Continuous-Integration-Config, 5Patch-For-Review: Disable mediawiki-phpunit-*-composer jobs for older MediaWiki core branches - https://phabricator.wikimedia.org/T113506#1670147 (10hashar) a:3JanZerebecki [10:16:22] (03CR) 10Hashar: [C: 032] Don't run mediawiki-phpunit-*-composer on REL1_23 [integration/config] - 10https://gerrit.wikimedia.org/r/240447 (https://phabricator.wikimedia.org/T113506) (owner: 10JanZerebecki) [10:17:09] (03Merged) 10jenkins-bot: Don't run mediawiki-phpunit-*-composer on REL1_23 [integration/config] - 10https://gerrit.wikimedia.org/r/240447 (https://phabricator.wikimedia.org/T113506) (owner: 10JanZerebecki) [10:22:29] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670158 (10zeljkofilipin) [10:22:47] 10Beta-Cluster, 10CirrusSearch, 6Discovery, 10pywikibot-core, and 3 others: Search test failed - https://phabricator.wikimedia.org/T113517#1670159 (10XZise) I can verify that our tests now pass that again: * Working: https://travis-ci.org/wikimedia/pywikibot-core/jobs/81935467#L696 * Same place from the lo... [10:25:09] 10Continuous-Integration-Config, 5Patch-For-Review: Disable mediawiki-phpunit-*-composer jobs for older MediaWiki core branches - https://phabricator.wikimedia.org/T113506#1670166 (10hashar) 5Open>3Resolved There was a patch pending for REL1_23 that failed gate. I +2ed it again and it did not trigger the... [10:26:59] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670173 (10zeljkofilipin) [10:28:16] (03PS8) 10Hashar: Add Jenkins tests for DeletePagesForGood [integration/config] - 10https://gerrit.wikimedia.org/r/237138 (owner: 10Paladox) [10:28:44] (03CR) 10Hashar: [C: 032] Add Jenkins tests for DeletePagesForGood [integration/config] - 10https://gerrit.wikimedia.org/r/237138 (owner: 10Paladox) [10:30:34] (03Merged) 10jenkins-bot: Add Jenkins tests for DeletePagesForGood [integration/config] - 10https://gerrit.wikimedia.org/r/237138 (owner: 10Paladox) [10:32:58] (03PS4) 10Hashar: Add jenkins tests for extensions/ORES.git [integration/config] - 10https://gerrit.wikimedia.org/r/239535 (owner: 10Paladox) [10:33:24] 10Browser-Tests, 5Patch-For-Review, 5WMF-deploy-2015-09-22_(1.26wmf24): All repositories that have browser tests should be updated to the latest version of watir-webdriver - https://phabricator.wikimedia.org/T112748#1670191 (10zeljkofilipin) [10:35:59] (03PS5) 10Hashar: Add composer/npm tests for extensions/ORES.git [integration/config] - 10https://gerrit.wikimedia.org/r/239535 (owner: 10Paladox) [10:37:04] (03CR) 10Hashar: [C: 032] Add composer/npm tests for extensions/ORES.git [integration/config] - 10https://gerrit.wikimedia.org/r/239535 (owner: 10Paladox) [10:37:55] (03Merged) 10jenkins-bot: Add composer/npm tests for extensions/ORES.git [integration/config] - 10https://gerrit.wikimedia.org/r/239535 (owner: 10Paladox) [10:42:48] (03PS4) 10Hashar: Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) (owner: 10Zfilipin) [10:42:57] (03CR) 10Hashar: [C: 032] Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) (owner: 10Zfilipin) [10:44:02] RECOVERY - Puppet failure on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:44:48] (03Merged) 10jenkins-bot: Delete broken ZeroBanner browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/240648 (https://phabricator.wikimedia.org/T113463) (owner: 10Zfilipin) [10:46:08] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:47:30] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:49:38] 10Browser-Tests, 5Patch-For-Review: Delete broken ZeroBanner browsertests Jenkins job - https://phabricator.wikimedia.org/T113463#1670226 (10hashar) 5Open>3Resolved Cleaned up! Thank you @jhobs, that is one less trouble for us :} [10:49:40] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1670228 (10hashar) [10:51:12] (03CR) 10Hashar: "Paladox wrote:" [integration/config] - 10https://gerrit.wikimedia.org/r/228488 (owner: 10Paladox) [10:57:31] (03CR) 10Hashar: [C: 04-1] "I don't think OpenStackManager pass the testsuite. Can be reproduced with a local setup that has only that extension and running php test" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [11:21:08] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [11:22:34] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [12:24:44] really guys? nano as the git editor on deployment-puppetmaster? [12:25:03] :) [12:31:00] 10Beta-Cluster, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670489 (10Krenair) I think we'd also want upload.beta.wmflabs.org, maybe stream.wmflabs.org, all of the m./zero. variants? What about mx.beta.w... [12:41:06] 10Beta-Cluster, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: Firewall rules too restrictive on deployment-restbase0x.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T113528#1670523 (10mobrovac) I cherry-picked the patch on `deployment-puppetmaster.deployment-prep.eqiad.wmflabs... [12:45:41] RECOVERY - Puppet staleness on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [3600.0] [12:50:01] 10Beta-Cluster, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: Firewall rules too restrictive on deployment-restbase0x.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T113528#1670551 (10mobrovac) After cherry-picking [PS 240673](https://gerrit.wikimedia.org/r/#/c/240673/) as wel... [12:55:14] Project beta-scap-eqiad build #71374: FAILURE in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/71374/ [12:55:33] Yippee, build fixed! [12:55:33] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #606: FIXED in 1 min 32 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/606/ [12:57:00] please someone give a wmf branch +2, I need it to make a build as preparation for SWAT: https://gerrit.wikimedia.org/r/#/c/240680/ [13:02:47] Isn't that supposed to be done during the swat window? [13:04:15] Krenair: no the wikidata build is usually prepared before. is composer run to change mediawiki/vendor during the SWAT? [13:05:34] Yippee, build fixed! [13:05:35] Project beta-scap-eqiad build #71375: FIXED in 1 min 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/71375/ [13:07:20] 10Beta-Cluster, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: Firewall rules too restrictive on deployment-restbase0x.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T113528#1670591 (10mobrovac) 5Open>3Resolved a:3mobrovac Merged, resolving. [13:10:01] 10Beta-Cluster, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670596 (10Lixxx235) Chmarkine, there's always StartCom/StartSSL which has free certs, and they're already trusted by default in all major brows... [14:08:39] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 10VisualEditor, 5Patch-For-Review: browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox jenkins job failing - https://phabricator.wikimedia.org/T111510#1670704 (10zeljkofilipin) Reverted the job to configuration that is currently in mast... [14:09:46] (03PS1) 10Zfilipin: Revert "Let MW-Selenium 1.x use the local test suite configuration" [integration/config] - 10https://gerrit.wikimedia.org/r/240701 (https://phabricator.wikimedia.org/T111510) [14:14:33] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 10VisualEditor, 5Patch-For-Review: browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox jenkins job failing - https://phabricator.wikimedia.org/T111510#1670712 (10zeljkofilipin) Reverted https://gerrit.wikimedia.org/r/#/c/226603/ with ht... [14:16:40] ostriches, thcipriani|afk: please someone give a wmf branch +2, I need it to make a build as preparation for SWAT: https://gerrit.wikimedia.org/r/#/c/240680/ [14:31:26] jzerebecki: sure, why does this need to merge pre-swat? [14:32:34] andrewbogott: good morning :-} Do you have some minutes to merge in a contint puppet patch for me please ? it is https://gerrit.wikimedia.org/r/#/c/240659/ [14:32:48] a basic refactoring, applied on labs and working there [14:33:10] hashar: sure, looking... [14:34:15] andrewbogott: I need to extract a couple packages from the mess of contint::packages::labs so I can include them for the Nodepool images [14:34:22] Yippee, build fixed! [14:34:22] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » en,contintLabsSlave && UbuntuTrusty build #74: FIXED in 21 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/74/ [14:36:25] Yippee, build fixed! [14:36:26] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #270: FIXED in 8 min 24 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/270/ [14:37:43] (03PS1) 10Hashar: Migrate tox*jessie jobs to Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/240705 [14:38:15] (03CR) 10Hashar: [C: 04-2] "Depends on:" [integration/config] - 10https://gerrit.wikimedia.org/r/240705 (owner: 10Hashar) [14:38:49] (03CR) 10jenkins-bot: [V: 04-1] Migrate tox*jessie jobs to Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/240705 (owner: 10Hashar) [14:38:50] andrewbogott: we will have to refactor the mess contint:: classes are :-/ Zeljko and I are pairing on it from time to time [14:39:47] (03PS2) 10Hashar: Migrate tox*jessie jobs to Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/240705 [14:39:59] 10Beta-Cluster, 6Labs, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1670836 (10Chmarkine) >>! In T50501#1670596, @Lixxx235 wrote: > Chmarkine, there's always StartCom/StartSSL which has free certs, and they're al... [14:40:56] Krinkle: thanks for the task about mwext-mw-selenium yesterday . I kind of overreacted and missed your point "get the job out of gate-and-submit" [14:41:16] Krinkle: Dan did the patch eventually [14:41:46] andrewbogott: thank you! [14:42:04] (03PS2) 10Hashar: nodepool: contint::packages::ops in images [integration/config] - 10https://gerrit.wikimedia.org/r/240660 [14:45:54] hashar: sure, I'm glad it was taken the way it was. [14:46:07] hashar: btw, https://tools.wmflabs.org/nagf/?project=integration#h_integration-slave-trusty-1016_cpu looks a bit oncerning [14:46:10] (view last month) [14:46:19] massive CPU spike for the last 2 days [14:47:54] Krinkle: yeah seems some process has been left behind. Maybe a parsoid daemon that hasn't been killed [14:48:15] we might be able to find it with atop which takes samples of cpu/mem usage every 10 minutes or so ( https://wikitech.wikimedia.org/wiki/Atop ) [14:53:33] Krinkle: not sure what happened :-/ I rebooted it this morning so it is essentially {solved} [14:55:10] http://bit.ly/1WkUcxg#integration-slave-trusty-1016_cpu_day [14:55:14] Ah yeah, it's back down now [14:55:16] but still very spiky [14:55:39] hashar: were there alerts for that spike yesterday? [14:57:09] Krinkle: I am not sure whether it is monitored [14:58:05] http://shinken.wmflabs.org/host/integration-slave-trusty-1016 [14:58:08] no cpu check [14:58:55] 10Browser-Tests, 5Patch-For-Review: Improve password fallback for mediawiki_selenium - https://phabricator.wikimedia.org/T112279#1670876 (10zeljkofilipin) a:5zeljkofilipin>3None [14:59:49] thcipriani: oh we can do it during swat but there is only 1 hour, which is a bit tight for that in my experience. i need to then run composer to update Wikidata, which is what then the patch that gets merged by the SWATing person, then that auto updates the submodule wmf22 and wmf24 and then someone needs to do a manual update of the submodule wmf23. [15:09:37] 10Beta-Cluster: Process accounting routinely fill up /var on deployment-bastion - https://phabricator.wikimedia.org/T91354#1670893 (10greg) ``` 06:38 < shinken-w> RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK ``` [15:12:33] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1670906 (10zeljkofilipin) [15:12:59] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1156290 (10zeljkofilipin) [15:14:53] (03PS2) 10Paladox: [OpenStackManager] Update jenkings tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 [15:15:39] (03PS3) 10Hashar: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:15:49] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1670909 (10zeljkofilipin) [15:15:56] (03CR) 10Hashar: "Updated commit message, waiting for tests results" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:16:17] (03CR) 10Paladox: "Hi what kind of problems I may be able to fix it." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:17:54] (03PS4) 10Paladox: [OpenStackManager] Update jenkings tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 [15:18:30] (03PS5) 10Paladox: [OpenStackManager] Update jenkings tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 [15:20:29] (03CR) 10Paladox: "I think you need to recheck since I uploaded over you sorry." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:21:52] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1670936 (10zeljkofilipin) [15:22:26] (03CR) 10Hashar: [C: 04-1] "And now the change depends on having composer introduced in the repository :D" (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:22:50] (03PS1) 10Hashar: nodepoo: realize puppet ::apt first [integration/config] - 10https://gerrit.wikimedia.org/r/240715 [15:23:35] 6Release-Engineering-Team, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1156290 (10zeljkofilipin) [15:24:34] (03CR) 10Hashar: [C: 032] nodepoo: realize puppet ::apt first [integration/config] - 10https://gerrit.wikimedia.org/r/240715 (owner: 10Hashar) [15:25:18] (03PS3) 10Hashar: nodepool: contint::packages::ops in images [integration/config] - 10https://gerrit.wikimedia.org/r/240660 [15:25:30] (03PS2) 10Hashar: nodepool: realize puppet ::apt first [integration/config] - 10https://gerrit.wikimedia.org/r/240715 [15:25:44] (03CR) 10Hashar: [C: 032] nodepool: realize puppet ::apt first [integration/config] - 10https://gerrit.wikimedia.org/r/240715 (owner: 10Hashar) [15:25:49] (03PS4) 10Hashar: nodepool: contint::packages::ops in images [integration/config] - 10https://gerrit.wikimedia.org/r/240660 [15:27:03] oh men [15:27:03] Error: /Stage[main]/Geoip::Data::Puppet/File[/usr/share/GeoIP]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///volatile/GeoIP [15:27:10] I hate Geoip [15:32:29] (03Merged) 10jenkins-bot: nodepool: realize puppet ::apt first [integration/config] - 10https://gerrit.wikimedia.org/r/240715 (owner: 10Hashar) [15:33:33] (03CR) 10Hashar: "authdns::lint requires GeoIP with volatile data but:" [integration/config] - 10https://gerrit.wikimedia.org/r/240660 (owner: 10Hashar) [15:35:09] (03PS1) 10Hashar: nodepool: install etcd directly [integration/config] - 10https://gerrit.wikimedia.org/r/240719 [15:37:57] (03CR) 10Paladox: "Composer file added at https://gerrit.wikimedia.org/r/#/c/240713/" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:38:13] (03PS1) 10Hashar: Create and trigger integration-config-puppet-validate [integration/config] - 10https://gerrit.wikimedia.org/r/240721 [15:38:31] (03CR) 10Hashar: "Yeah!" [integration/config] - 10https://gerrit.wikimedia.org/r/240719 (owner: 10Hashar) [15:40:55] (03CR) 10Paladox: [OpenStackManager] Update jenkings tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:41:23] (03CR) 10Hashar: "Yup nice. Now you can remove the template:" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:42:10] (03CR) 10Hashar: [C: 032] "Job created:" [integration/config] - 10https://gerrit.wikimedia.org/r/240721 (owner: 10Hashar) [15:42:29] (03PS6) 10Paladox: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 [15:42:53] (03CR) 10Paladox: "phplint is for non-whitelisted users." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:44:45] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1671020 (10greg) Just a note: the PhantomJS ability to send custom/non-standard headers was used by ZeroBanner. "Luckily" their tests aren't worthwhile anymore so... [15:44:48] (03CR) 10Hashar: "Well there is now phplint template in Zuul. But I guess we can add the job to the check pipeline. Ie:" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:45:19] 6RelEng-Admin, 15User-greg: Write draft/strawman code-hosting exception guideline - https://phabricator.wikimedia.org/T109920#1671030 (10JanZerebecki) No I see the entanglement while going from the exception for hosting code on github, because there is a technical reason: "When the Continuous Integration servi... [15:46:15] (03PS2) 10Hashar: nodepool: install etcd directly [integration/config] - 10https://gerrit.wikimedia.org/r/240719 [15:46:24] (03CR) 10Hashar: [C: 032] nodepool: install etcd directly [integration/config] - 10https://gerrit.wikimedia.org/r/240719 (owner: 10Hashar) [15:46:48] (03CR) 10Hashar: "Got etcd installed via https://gerrit.wikimedia.org/r/#/c/240719/" [integration/config] - 10https://gerrit.wikimedia.org/r/240660 (owner: 10Hashar) [15:49:30] (03Merged) 10jenkins-bot: Create and trigger integration-config-puppet-validate [integration/config] - 10https://gerrit.wikimedia.org/r/240721 (owner: 10Hashar) [15:49:32] (03Merged) 10jenkins-bot: nodepool: install etcd directly [integration/config] - 10https://gerrit.wikimedia.org/r/240719 (owner: 10Hashar) [15:49:40] (03PS7) 10Paladox: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 [15:50:22] greg-g: I've got nothing exciting; mind if I skip? [15:50:29] James_F: go ahead [15:50:33] Thanks. :-) [15:50:42] (Actually, we might, but there's nothing definite.) [15:50:48] kk [15:50:48] !log pushing new reference image to nodepool to include etcd package. Followed doc from https://wikitech.wikimedia.org/wiki/Nodepool#Publish_on_labs [15:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:51:50] !log nodepool new snapshot image is ci-jessie-wikimedia-1443109895 [15:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:52:28] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/240660 (owner: 10Hashar) [15:53:58] (03CR) 10Paladox: "Ok done." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:54:49] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer: Fix failed MultimediaViewer browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94157#1671053 (10Jdlrobson) p:5Normal>3High I'd say this is high now my team is supposed to look after this extension and is unfamiliar with exactly what it's... [15:55:55] 10Deployment-Systems, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 10Unplanned-Sprint-Work, and 4 others: is shown instead of the actual message on Special:ContentTranslation in ... - https://phabricator.wikimedia.org/T112964#1671060 [15:56:13] 6Release-Engineering-Team, 10MediaWiki-Releasing, 10MediaWiki-Tarball-Backports, 7Security-Extensions: Backport security fixes to stable + LTS extension branches - https://phabricator.wikimedia.org/T108734#1671071 (10Aklapper) [15:56:16] 6Release-Engineering-Team, 10MediaWiki-Releasing: Test files appear in MW tarball diff patches, generate ignored hunks - https://phabricator.wikimedia.org/T94664#1671072 (10Aklapper) [15:56:24] (03PS8) 10Hashar: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:56:36] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 10MediaWiki-Releasing: Automate Testing of MediaWiki Tarball releases - https://phabricator.wikimedia.org/T974#1671087 (10Aklapper) [15:56:59] (03CR) 10Hashar: "The experimental pipeline was referring to the template instead of the job name. I fixed it and rewrote the commit summary." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:57:17] 10Deployment-Systems, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 10Unplanned-Sprint-Work, and 4 others: is shown instead of the actual message on Special:ContentTranslation in ... - https://phabricator.wikimedia.org/T112964#1671091 [15:57:22] (03CR) 10jenkins-bot: [V: 04-1] [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [15:59:34] (03CR) 10Hashar: "Bah:" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [16:03:46] 6RelEng-Admin, 15User-greg: Write draft/strawman code-hosting exception guideline - https://phabricator.wikimedia.org/T109920#1671105 (10greg) Ah, I see, thanks for the clarification. That's a good point. Lemme context switch back to this later and get back to you more fully :) [16:07:36] 10Continuous-Integration-Infrastructure, 10MediaWiki-Releasing: Automate Testing of MediaWiki Tarball releases - https://phabricator.wikimedia.org/T974#1671116 (10greg) [16:09:47] (03CR) 10Paladox: "Oh how do we fix that." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [16:10:25] (03CR) 10Paladox: "Should I do check-only: instead of check:" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [16:10:31] !sal [16:10:31] https://tools.wmflabs.org/sal/releng [16:10:53] !log nodepool image ci-jessie-wikimedia-1443109895 fails to acquire network. Reverting to previous image. [16:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:12:17] !log openstack image delete ci-jessie-wikimedia && openstack image set --name ci-jessie-wikimedia ci-jessie-wikimedia_old_20150924 [16:12:20] should be good [16:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:13:03] I give up for today [16:13:36] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-MultimediaViewer: MultimediaViewer thumbnailBeforeProduceHTML hook breaks other extensions parser tests - https://phabricator.wikimedia.org/T69302#1671136 (10Nemo_bis) >>! In T69302#1669113, @Jdlrobson wrote: > Is it still a high priority? Yes, u... [16:17:24] :( [16:17:32] that wasn't a good exit [16:36:22] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1671200 (10hashar) 3NEW [16:50:52] why did ForrestBot tag T92796 as "MW-1.26-release" on August 22? https://phabricator.wikimedia.org/T92796#1564640 [16:51:32] merged core patches get the tag of the next stable release [16:53:28] ForrestBot's idea is to see all tasks which were fixed in a given release, both for wmfXX branches and the big'uns. [16:53:38] I'm unsure of the helpfulness of the later [16:54:13] s/ForrestBot/ReleaseTaggerBot/ # 'twas renamed [16:55:09] oh, but 'WM-1.26-release' is also interpreted as 'should be fixed before the release', which is something different than 'a patch for this is in ' :/ [16:55:24] so far I've found it really useful for quickly identifying if a bug fix was backported to 1.25 [16:55:34] valhallasw`cloud: right [16:55:46] I think it will prove to be useful when we start writing up [[MediaWiki 1.26]] and other release notes [16:56:08] legoktm: but what about edge cases like the one robla linked where the patch doesn't actually resolve the issue? [16:56:26] is that "noise" manageable? [16:56:51] someone should manually remove the tag whenever they re-opened the bug [16:57:04] no one knows about the project [16:57:41] legoktm: "someone should manually..." seems like a dangerous way to start a sentence :-) [16:57:43] (It's not just a "tag" in the parlance of https://www.mediawiki.org/wiki/Phabricator/Project_management#Types_of_Projects, it's a "release") [16:57:59] greg-g: I think the answer is 'we don't know'. This is the first release that will have bugs tagged as MW-1.26-release [16:58:05] valhallasw`cloud: fair [16:58:09] and not even all of them yet [16:58:23] but I think maybe some confusion is caused by the thought that it is just a tag, not a release project [16:58:28] no, most of them should be in there, I think, because 1.25 was released at the hackathon [16:58:40] the mw-1.2X-release projects are... release projects, not just informational tags [16:58:50] we're just missing stuff that was merged between REL1_25 branch point and hackathon [16:59:08] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 7Blocked-on-Operations: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1671352 (10mmodell) [16:59:28] so, what releasetaggerbot is doing shoudl really be tagged as "fixed-in-1.26" or similar [17:00:06] again, I thinmk that's the mental model discrepency: #mw-1.26-release is meant to track blockers, whereas RTB is just tagging for inforamtional purposes what could be found by a query after the fact :) [17:00:18] *nod* [17:00:55] * greg-g is thinking in his metadata/ontology brain right now, it's a scary place [17:02:02] * legoktm wonders why phabricator doesn't add opengraph tags [17:02:57] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 7Blocked-on-Operations: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1671388 (10greg) [17:03:40] legoktm: shush! quit trying to nerd-snipe me! [17:03:51] :P [17:05:14] ohhh, I haven't read wikitech-l yet [17:05:45] legoktm: :-) yeah, that's what prompted me to say something here [17:06:30] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 7Blocked-on-Operations: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1671428 (10greg) [17:06:59] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 7Blocked-on-Operations: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1314620 (10greg) (updated description to just talk about exposing ssh, since we split off websockets, which we still want, plzkthxbai) [17:07:26] legoktm: heh :) [17:08:31] 10Deployment-Systems, 6Release-Engineering-Team, 6Performance-Team, 6operations, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1671438 (10mmodell) >>! In T103886#1402775, @faidon wrote: >> Iterate on the graceful restart proced... [17:09:11] For 1.27, it might make sense to have nee Forrestbot use a new tag (e.g. "1.27-check-this") and then have a single blocking "MW-1.26-release" task which is "clear out the '1.27-check-this' queue" [17:09:27] er....MW-1.27-release, that is [17:09:49] not bad [17:38:43] oh! User Since Sep 19 2014, 17:07 (52 w, 6 d) [17:38:54] did we just miss the Phabaversary? [17:44:22] 10Continuous-Integration-Config, 7I18n: Configure banana checker for i18n files to run on all MediaWiki extensions (tracking) - https://phabricator.wikimedia.org/T94547#1671716 (10Prtksxna) [17:44:29] 10Continuous-Integration-Config, 7I18n: Configure banana checker for i18n files to run on all MediaWiki extensions (tracking) - https://phabricator.wikimedia.org/T94547#1166201 (10Prtksxna) [17:56:39] 10Continuous-Integration-Config, 7I18n: Configure banana checker for i18n files to run on all MediaWiki extensions (tracking) - https://phabricator.wikimedia.org/T94547#1671824 (10Krinkle) >>! In T94547#1337361, @hashar wrote: > @krinkle some folks proposed us to have a "structure" job that would be run for ev... [18:01:00] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1671856 (10demon) 5Open>3Resolved Fixed for reals this time. [18:01:09] Krenair: ^ [18:06:33] thanks [18:06:41] yw [18:27:16] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672061 (10hashar) From nodepool debug log, three instances in the 10.68.20.0/24 range are unreachable: ``` Creating server with hostname ci-... [18:33:10] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672078 (10hashar) Potential instance for testing: ci-jessie-wikimedia-1443110708 10.68.20.123 I can ping it from labs bastion but not from... [18:37:51] andrewbogott: I found out how nodepool spammed instances creation. labnodepool1001.eqiad.wmnet is unable to reach instances in the range 10.68.20.0/24 for some reason [18:41:40] hashar: hm, this has me wondering if nova is smart enough to re-use old IPs... [18:41:43] but that’s another question I guess [18:41:59] hashar: it’s just that exact range that’s blocked? [18:42:14] andrewbogott: apparently yeah [18:42:22] 10.68.17.0/24 and .18. works fine [18:42:27] are you sure it’s not just anything greater than .20? [18:42:38] 10.68.20.123 is an example [18:42:58] I haven't looked for IP in 10.68.21.0/24 [18:43:16] I think somewhere you (or I) have a netmask of 10.68.16.0/22 when it should be 10.68.16.0/21 [18:48:14] andrewbogott: yup that is what I thought, but can't find any such trace in puppet repo [18:50:56] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672111 (10XZise) Following the link I posted above I don't get “bar” anymore (good) but I still get invalid wikis. So I just written a short script: * `ht... [18:59:13] andrewbogott: oh wikitech has a page referencing /22 instead of /21 https://wikitech.wikimedia.org/wiki/Labs_troubleshooting [19:03:12] RECOVERY - Host mira is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms [19:03:20] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672156 (10hashar) I think somewhere you (or I) have a netmask of 10.68.16.0/22 when it should be 10.68.16.0/21 I can't see a... [19:05:01] chasemp: We may be finding some fallout from our migration to labnet1002, can you join us and check? [19:05:14] The docs you wrote about routing say /22 for a subnet that I think should be /21 [19:05:16] reading [19:05:36] I would expect the consequences to be larger than they are, but… I’m nonetheless concerned [19:06:00] 10.64.20.13 was the old labnet1001 right? [19:06:41] I think so [19:07:08] the old rouee was 10.68.16.0/22 to that and the new was 10.68.16.0/22 next-hop 10.64.20.25 [19:07:14] so teh address block didn't change I think [19:07:18] what makes you think /21? [19:07:40] hm [19:07:46] We definitely have VMs that are in /21 [19:07:58] what subnet [19:08:00] and it’s /21 on the docs that I wrote here: https://wikitech.wikimedia.org/wiki/IP_addresses [19:08:05] and it’s /21 in the nova config [19:08:20] ok I'm checking the device now [19:08:27] itself those were the notes I had made at the time [19:08:58] chasemp: https://dpaste.de/ZfjL [19:09:13] I backed up teh config before working on it [19:09:15] and it was [19:09:17] set routing-options static route 10.68.16.0/22 next-hop 10.64.20.13 [19:09:18] set routing-options static route 10.68.16.0/22 readvertise [19:09:18] set routing-options static route 10.68.16.0/22 no-resolve [19:09:23] before I touched it [19:09:35] I believe [19:10:10] Yeah, no doubt it was wrong before as well [19:10:18] OR it was right and my config of nova is wrong [19:11:00] I honestly don’t know where the canonical list of what range is dedicated to what is [19:11:08] Maybe I should file a bug and we should ask mark tomorrow? [19:11:25] well [19:11:26] manifests/network.pp: 'ipv4' => '10.68.16.0/21', [19:11:26] manifests/role/nova.pp: 'production' => '10.68.16.0/21', [19:11:27] manifests/role/nova.pp: 'production' => '10.68.16.0/21', [19:11:29] is in puppet [19:12:17] that’s convincing [19:12:24] I'm wondering if it wasn't wrong to begin w/ in the router [19:12:38] doh sorry I wasn't looking at this chan [19:12:49] I don’t understand why other instances (besides hashar’s) are working outside of that subnet though? [19:12:53] Let me try one [19:13:21] maybe it is specific to labnodepool1001.eqiad.wmnet and whatever router it is connected to ? [19:13:36] * andrewbogott creates an instance at 10.68.20.126 [19:14:38] so I'm pretty sure what's there now as there before, that doesn't mean it's correct [19:14:40] I didn't change [19:14:42] set routing-options static route 10.68.16.0/22 readvertise [19:14:42] set routing-options static route 10.68.16.0/22 no-resolve [19:14:50] and they match the mask [19:14:50] and I have no idea if it has always been the case or if it suddenly happened [19:15:53] but I'm feeling like /21 was //allocated// and set to /22 accidentally in the router whenever [19:16:20] my new instance doesn’t work [19:16:23] ok yeah the fw rule looks like it allows /21 [19:16:30] so I suspect that we just overflowed out of /22 and now everything new is busted [19:16:31] \O/ problem reproduced [19:17:17] andrewbogott: give me a moment here to kind of sanity check this befoer I go leaping [19:17:27] yep [19:17:43] no-new-instances isn’t the emergency that existing-instances-broken would be [19:26:48] yeah, built a second new instance and it is also doa [19:27:23] ok now try [19:28:55] andrewbogott: ^ [19:29:21] chasemp: booting a new one, it’ll take a few [19:29:27] k [19:31:51] chasemp: new instance at 10.68.20.129 works fine [19:31:57] ok so yeah [19:32:00] so… you fixed it. Or fixed something at least :) [19:32:09] this wasn't our last maint but you were right to think it [19:32:14] I looked back in the rollback history [19:32:19] and couldn't see where it had changed so [19:32:27] I think it was the case and we just had headroom pre-nodepool? [19:32:52] anyway, nice job documenting the switchover, that’s the only place there was a searchable reference to /22 [19:33:18] yeah, I think we just didn’t get there until now. [19:33:24] thank you! [19:33:27] np man [19:33:33] hashar, want to see if nodepool is happier now? [19:33:48] sure [19:34:22] chasemp: andrewbogott seems to be working now :-} [19:34:59] at least labnodepool1001 manage to reach an ip in 10.68.20.x [19:35:34] cool [19:35:39] maybe since nodepool spawns a lot of instances, that fills the DHCP pool quickly ? [19:36:17] I don’t know how the recycling process works. It clearly does recycle them but maybe there’s a grace period or something. [19:36:23] I made a bug for myself to learn how that works [19:36:42] you can get https://phabricator.wikimedia.org/T113623 closed :} [19:36:49] !sal [19:36:49] https://tools.wmflabs.org/sal/releng [19:37:44] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672262 (10Andrew) seems to be fixed by Chase's updating of the router config from /22 to /21. [19:37:48] !log nodepool: openstack image set --name ci-jessie-wikimedia_old_20150924 ci-jessie-wikimedia [19:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:38:33] chasemp: shall I s/22/21/g on https://wikitech.wikimedia.org/wiki/Labs_troubleshooting ? [19:38:48] yeah [19:39:11] !log nodepool new snapshot image ci-jessie-wikimedia-1443123518 [19:39:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:39:20] chasemp: ok, done — that’s all you changed, right? [19:39:55] 5Continuous-Integration-Scaling, 6Labs, 10Labs-Infrastructure: labnodepool1001.eqiad.wmnet can't reach IP in 10.68.20.0/24 - https://phabricator.wikimedia.org/T113623#1672267 (10Andrew) 5Open>3Resolved a:3Andrew [19:40:21] UserWarning: Unknown ssh-rsa host key for 10.68.20.130: !!! [19:40:28] definitely works thanks a lot ! [19:40:32] andrewbogott: not quite as there were other /22 refs there already [19:40:34] I had to change now [19:40:52] set routing-options static route 10.68.16.0/21 next-hop 10.64.20.25 [19:40:52] set routing-options static route 10.68.16.0/21 readvertise [19:40:52] set routing-options static route 10.68.16.0/21 no-resolve [19:48:30] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:54:43] (03PS3) 10Hashar: Migrate tox*jessie jobs to Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/240705 [19:54:48] (03PS4) 10Hashar: Migrate tox*jessie jobs to Nodepool instances [integration/config] - 10https://gerrit.wikimedia.org/r/240705 [19:55:37] ah [20:02:35] (03CR) 10Hashar: "Zeljko lets do it #together on friday morning :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/240705 (owner: 10Hashar) [20:02:47] that is safer [20:06:30] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672342 (10demon) Underlying bug is fixed. Interwiki map probably needs regeneration? [20:10:25] (03PS9) 10Hashar: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:11:21] (03CR) 10Hashar: "So the reason was the python-lint-checkonly template that introduced a definition for the check-only template. That was for the python j" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:12:51] (03CR) 10Hashar: [C: 032] [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:19:50] (03Merged) 10jenkins-bot: [OpenStackManager] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:25:59] (03CR) 10Paladox: "Ok thankyou." [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:26:36] (03CR) 10Hashar: "That works now thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:26:58] andrewbogott: OpenStackManager extension now support 'check experimental' command, that triggers the mediawiki tests [20:27:42] andrewbogott: example usage at bottom of https://gerrit.wikimedia.org/r/#/c/240713/ [20:27:45] test result https://integration.wikimedia.org/ci/job/mwext-testextension-zend/9526/testReport/ :D [20:29:30] (03CR) 10Paladox: "Thankyou for running that. I have filed the task at https://phabricator.wikimedia.org/T113655" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:29:46] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-OpenStackManager: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672467 (10Paladox) [20:35:09] (03CR) 10Paladox: "I think I fixed it here https://gerrit.wikimedia.org/r/#/c/240876/" [integration/config] - 10https://gerrit.wikimedia.org/r/239538 (owner: 10Paladox) [20:35:36] PROBLEM - Puppet failure on deployment-conf03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:37:06] hashar: cool, I will try it! [20:37:28] thcipriani: Ping [20:37:44] ostriches: pong. What's up? [20:37:53] andrewbogott: the only failure are related to some files that can no more be found in mediawiki core. They have been moved to /mw-config/ which is unreachable afaik [20:38:06] andrewbogott: we might want to use some new icons from ooui :-} [20:38:29] thcipriani: I wanna land mukunda's logging decorator. I'm mostly happy with it. However when I try to combine it with another decorator we already have, it breaks. It's beyond my python-foo. [20:38:35] https://phabricator.wikimedia.org/P2076 describes the patch and result. [20:38:45] * thcipriani looks [20:38:47] greg-g, twentyafterfour: around? wanted to ask some questions about https://phabricator.wikimedia.org/T110070 ? [20:39:01] SMalyshev: what's up? [20:39:12] thcipriani: It assumes you have https://gerrit.wikimedia.org/r/#/c/239028/11 applied, obviously. [20:39:15] ostriches: what's breaks [20:39:34] ahh reading the paste [20:39:46] (03PS1) 10Paladox: [OpenStackManager] Remove experiment: [integration/config] - 10https://gerrit.wikimedia.org/r/240879 [20:40:00] PROBLEM - Host deployment-conf02 is DOWN: CRITICAL - Host Unreachable (10.68.20.124) [20:40:06] twentyafterfour: hey! so basically I'm trying to figure out how to do deployment for T110070. I have a git repo, and I want it to deployed somewhere on every host that has mediawiki::web::sites [20:40:22] (03CR) 10Paladox: "This can be merged one https://gerrit.wikimedia.org/r/#/c/240876/ is merged." [integration/config] - 10https://gerrit.wikimedia.org/r/240879 (owner: 10Paladox) [20:40:41] twentyafterfour: and unlike git deploy I'd like not to have .git etc. there since I need to put these files on the http [20:41:06] twentyafterfour: so the question is how to do this, I was told you may be able to think of something [20:41:47] ostriches: I thought we were going to abandon the logger argument and just use the global function to get a logger when needed [20:42:32] SMalyshev: just for future reference (you did fine ;) ): https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team#Help.21 [20:42:39] that is a strange exception though. huh, looks like I'm catching indexerror when I maybe should be catching valueerror. still looking. [20:42:58] SMalyshev: could we just combine it with the regular mediawiki deployment? [20:43:17] twentyafterfour: not sure what you mean by "combine" [20:43:25] it's a separate git repo [20:44:05] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672555 (10Krenair) ```krenair@deployment-bastion:~$ sudo -u jenkins-deploy updateinterwikicache Updating interwiki cache... ___ ____ ⎛... [20:44:06] SMalyshev: does it need to be deployed more often than once a week? [20:44:35] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-OpenStackManager, 5Patch-For-Review: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672556 (10hashar) They have been renamed in MediaWiki core since MediaWiki 1.24.0 by https://gerrit.wikime... [20:44:53] the mediawiki deployment already handles pushing out arbitrary php code and it already removes the .git directory from the deployment, and it already knows which servers are web servers [20:45:19] twentyafterfour: possibly, but that is negotiable I guess. It's for www.wikipedia.org so there's not much there that requires urgent deployment I imagine [20:45:21] so if it's as simple as adding one more git checkout to the deployment then that seems fairly easy to manage [20:45:23] and it already handles submodules [20:46:11] twentyafterfour: so how one would go about adding one more git checkout? [20:46:34] SMalyshev: if you're comfortable with it following the same release cadence (push out from master on Tuesday, with always-up-to-date-master on Beta Cluster) then easy-peasy [20:46:42] SMalyshev: I'm thinking that through... [20:47:25] greg-g: Well, we'll need to ask Mxn I guess but currently I don't see a reason why it won't work [20:48:08] if we can cause apache to look in the right place for the docroot then I think it can just be a subdirectory in /srv/mediawiki and will get deployed with everything else [20:48:34] twentyafterfour: well, we definitely can do Alias in apache I think [20:48:54] twentyafterfour: like: Alias /portals/ /srv/mediawiki/portals/ then RewriteRule ^/$ /portals/www.wikipedia.org.html [L] [20:48:59] something like that [20:49:15] the question is how we get the git stuff into /srv/mediawiki/portals/ [20:49:31] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-OpenStackManager, 5Patch-For-Review: OpenStackManager fails the mwext-testextension-zend test - https://phabricator.wikimedia.org/T113655#1672572 (10Paladox) Oh I have copied the tree images that are required and added them to the extension dire... [20:50:17] SMalyshev: one-of "oh crap" updates of course welcome as needed :) but it'll auto-deploy for you at least that often, is the main point. [20:50:42] one-off [20:50:58] greg-g: yeah that sounds good [20:52:14] SMalyshev: to get the files into /srv/mediawiki/portals, just tell me which git repo you want me to check out and I'll clone it before the next deployment [20:53:07] twentyafterfour: does it have to be done just once or each time? [20:53:36] twentyafterfour: the repo is this: https://git.wikimedia.org/summary/wikimedia%2Fportals [20:53:41] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672583 (10XZise) Nope I still get the same result. [20:54:59] twentyafterfour: or for gerrit I guess https://gerrit.wikimedia.org/r/wikimedia/portals [20:55:39] RECOVERY - Puppet failure on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:55:42] SMalyshev: and I guess the MW bits to not look at the MW-hosted version are handled by you already (or you have the changes ready to do it?)? [20:56:33] greg-g: yes, that would be in puppet patch once the repo is set up [20:56:45] actually, I can put it to gerrit right now... [20:57:17] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672597 (10Krenair) 5Resolved>3Open Looks like these extra languages are all values of $languageAliases in WikimediaMaintenance's dumpInterwiki.php [20:57:40] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-ZeroBanner, 6Zero, 5Patch-For-Review: Stop using PhantomJS for ZeroBanner browser test - https://phabricator.wikimedia.org/T113280#1672600 (10hashar) a:3zeljkofilipin @zeljkofilipin got rid of it. The ZeroBanner browser test is gone {T113463} [20:57:47] greg-g: https://gerrit.wikimedia.org/r/#/c/240888/ - this is for beta only, but production would look similar [20:57:54] Actually, question. [20:58:04] I think Daniel brought it up earlier. [20:58:15] Why can't we put this in ./docroot/ in mw-config again? [20:59:32] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #2647: FAILURE in 30 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/2647/ [21:01:19] ostriches: do you mean portals? Permissions mostly I assume. [21:01:23] 10Beta-Cluster, 10pywikibot-core, 5Patch-For-Review: Link.langlinkUnsafe does not work on Beta-Cluster wikis - https://phabricator.wikimedia.org/T112006#1672607 (10Krenair) (eo, ja, and zh are unaffected because we have beta wikis for all of those. No idea about cs and da) [21:01:29] SMalyshev: Permissions to what? [21:01:37] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-ZeroBanner, 6Zero, 5Patch-For-Review: Stop using PhantomJS for ZeroBanner browser test - https://phabricator.wikimedia.org/T113280#1672609 (10hashar) 5Open>3Resolved [21:01:39] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1672610 (10hashar) [21:01:46] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1660602 (10hashar) Yup so we wanted to {T113280} Zeljko found out the tests are broken beyond repair/available resources and got rid of them {T113463} after @jhob... [21:01:49] ostriches: there are some people that have access to editing HTML for portals but not mw-config [21:02:01] to put up on gerrit and merge I think, so the same sysop type ppl can contribute here that do it on the portal [21:02:29] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1672618 (10hashar) So now we can cleanup references to PhantomJS in the browser tests. I found a bunch of references to it in integration/jenkins.git :-( [21:02:37] 10Continuous-Integration-Infrastructure: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1672619 (10hashar) [21:03:13] ostriches: and I understand mw-config has some stuff that people would be reluctant to give access to to everybody that can edit HTML on portal. If I'm wrong of course then it can go into docroot, but so far that was my impression [21:04:07] to merge mediawiki-config changes you need deployment access to actually carry them out after they're in gerrit [21:05:00] But if we're talking about syncing the portals with the deploy cycle which is weekly, we're already talking about needing deploy access. [21:05:22] ostriches: well, not exactly if it's automatic [21:06:03] ostriches: and I think "asking deployer to deploy new stuff" and "having +2 to mw-config" is not the same? [21:06:39] even "having +2 to some small part deployed under docroot" and "having +2 to whole mw-config" [21:07:00] what I was suggesting is to make it a submodule in docroot [21:07:03] I mean, I don't care too much, for me I'm fine with anything but that's what I was told :) [21:07:15] twentyafterfour: submodules suck [21:07:16] * ostriches smacks [21:07:36] I know the suck but submodules work for something like this just fine [21:08:05] Wait, we're talking about syncing the portals with the deployment cycle? [21:08:26] twentyafterfour, you still need a way to get the submodule update deployed [21:08:29] If it's a submodule in docroot we're already talking about needing a deployer so why have the level of abstraction? [21:08:50] last time we talked Krenair was against submodule [21:09:07] (In which case, it won't be automagic either, submodule + docroot) [21:09:16] the main thing is reusing the deployment architecture we already have [21:09:25] I guess I just don't see what's special snowflake about the portals that needs it to be automagic. [21:09:48] I don't think it needs automagic [21:10:01] ostriches: well, so what is the process that you propose? i.e. after I +2 the change into portal, what happens? [21:10:17] I was just assuming that it could get pushed out with the train without much special effort at all [21:10:24] SMalyshev: The deployer deploys it like any other docroot change. [21:11:29] ostriches: you mean one has to put it on SWAT board and go through all the process or it just automatically goes into production on Tuesday or whatever is the day? [21:11:44] * SMalyshev is not familiar with "any other" procedure too much :) [21:12:35] I was suggesting automatic tuesday [21:12:41] ostriches, we're moving from a system where it is basically instant if you ignore the caching [21:12:59] how's it currently deployed? [21:13:06] and what's wrong with the current system then? [21:13:07] wiki page, see extract2.php [21:13:24] (03PS2) 10Niedzielski: Run adb-setup before Android emulator tests [integration/config] - 10https://gerrit.wikimedia.org/r/239902 [21:14:59] twentyafterfour: what's wrong is that it's very ad-hoc and if more than one person needs to work on it it becomes messy. Basically like shared drive instead of VCS :) [21:15:20] yeah ... [21:15:42] * twentyafterfour didn't realize it was a wikipage [21:16:09] Yippee, build fixed! [21:16:10] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #12: FIXED in 4 min 38 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/12/ [21:16:10] there's a mediawiki backend for git. [21:16:17] yes, it is. Which also makes it a bit hard to see what the change does too, since it's "raw html in wiki page" [21:16:22] * twentyafterfour is just joking, in case that wasn't obvious [21:16:56] there was a time when I thought wiki pages and version control were pretty much the same thing ;) [21:17:34] twentyafterfour: for some definition of version control :) [21:18:27] twentyafterfour: anyway, if we could have the same deploy cycle as mediawiki has (i.e. push to master in git, have it on beta at once, have it on production once weekly sync is done) then I think it would be ok [21:18:52] it won't be instant but in most cases one doesn't need instant, and if there's an exception one can ping a deployer [21:19:09] (03PS1) 10Paladox: [AccessibilitySimulation] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/240898 [21:19:38] Once it's merged it will go out automatically. [21:19:41] (03PS2) 10Paladox: [AccessibilitySimulation] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/240898 [21:19:50] Weekly [21:19:52] somehow it seems like wiki pages should be a lot closer to git than they are currently. [21:20:30] * greg-g +1's SMalyshev last statement :) [21:20:32] SMalyshev: well the way I was suggesting is with a submodule which seems to have been overruled [21:20:42] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #13: FAILURE in 4 min 32 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/13/ [21:20:50] Unless you needed a one off deploy [21:20:56] Why don't we just treat it like an extension? [21:21:18] greg-g: that's what I was thinking essentially [21:21:28] exactly, I'm curious why there's objections to that [21:21:36] though I don't know about the whole branching thing, I'd rather just deploy the latest commit on master at any given time [21:21:46] +1 to that as well ;) [21:21:52] Mw overhead just for the docroot seems overkill [21:21:53] twentyafterfour: latest commit sounds fine [21:22:20] ostriches: which overhead? [21:22:27] ostriches: yeah, but... other option is puppet, which has no "just deploy from master automatically every week" option :) [21:22:42] overhead of simple association, I guess (is how I read his statement) [21:23:03] greg-g: well, we could easily make a "just deploy master every 20 minutes" puppet rule [21:23:15] Meh [21:23:18] ostriches: correction "from master that I was able to merge to, without waiting for someone from ops to merge it" [21:23:35] twentyafterfour: see same clarification :) ^ [21:23:50] Driving. Shouldn't be on irc lol [21:23:56] what?! [21:24:11] omg dude [21:24:19] Multitasking! [21:24:20] that's pretty hard core [21:24:42] alright, ostriches, you quit irc'ing and just drive, comment on the task when you're at a real keyboard [21:24:45] :) [21:24:59] ostriches: I have a new level of respect (and concern) for your dedication to your job. [21:25:27] "Picard Management Tip: Tell your crew to they shouldn't multitask when their life is on the line." [21:25:30] kind of horrified but also impressed at the quality of responses [21:25:58] chasemp: but what you can't see is the quality of his driving ;) [21:26:03] :) [21:27:44] thcipriani: you ever figure out what was going on with that paste from ostriches? it didn't quite make sense to me [21:28:15] ಠ_ಠ [21:29:27] twentyafterfour: yeah, I'm playing with it now. So after the log context decorator, the argspec on the function passed to inside_git_dir is []. [21:38:23] PROBLEM - Puppet staleness on mira is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [21:42:07] thcipriani: there is @wraps ... I thought I used it [21:42:12] functools.wraps [21:42:31] so what's happening is the function returned by log_context decorator is stripped of its argpsec, then the inside_git_dir decorator fails because it looks at that argspec [21:43:08] thcipriani: right, I thought that @wraps was supposed to fix that problem [21:43:09] wraps doesn't preserve that info, evidently: https://bugs.python.org/issue23764 [21:43:53] there is a 3rd party library that does: https://pypi.python.org/pypi/decorator/ [21:47:45] 10Deployment-Systems, 5Release-Engineering-Epics, 10ReleaseTaggerBot, 7Epic: EPIC: Code Deploy Dashboard - https://phabricator.wikimedia.org/T280#1672801 (10greg) [21:48:02] 5Release-Engineering-Epics, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week - https://phabricator.wikimedia.org/T94150#1672803 (10greg) [21:50:30] 10Deployment-Systems, 5Release-Engineering-Epics, 7Epic: Merge to deployed branches instead of cutting a new deployment branch every week. - https://phabricator.wikimedia.org/T89945#1672808 (10greg) [21:53:19] 10Continuous-Integration-Infrastructure: Remove PhantomJS from the CI infrastructure - https://phabricator.wikimedia.org/T113279#1672815 (10greg) >>! In T113279#1672610, @hashar wrote: > And no. We are not going to setup a proxy anytime soon. I guess if we ever want to do such feature switch, we would probabl... [21:56:42] RECOVERY - Puppet failure on mira is OK: OK: Less than 1.00% above the threshold [0.0] [21:58:25] RECOVERY - Puppet staleness on mira is OK: OK: Less than 1.00% above the threshold [3600.0] [21:58:44] 10MediaWiki-Releasing, 10ReleaseTaggerBot: Change how ReleaseTaggerBot handles major MW releases - https://phabricator.wikimedia.org/T113628#1672842 (10greg) See discussion on wikitech-l, specifically my response to @tgr here: https://lists.wikimedia.org/pipermail/wikitech-l/2015-September/083372.html [22:06:51] 10MediaWiki-Releasing, 10ReleaseTaggerBot: Change how ReleaseTaggerBot handles major MW releases - https://phabricator.wikimedia.org/T113628#1672890 (10greg) [22:24:33] (03PS1) 10Thcipriani: De-decorate inside_git_dir [tools/scap] - 10https://gerrit.wikimedia.org/r/240912 [22:26:33] PROBLEM - Puppet failure on deployment-conf03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [22:30:24] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [22:41:38] RECOVERY - Puppet failure on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:50:31] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [22:53:17] 10MediaWiki-Releasing, 10ReleaseTaggerBot: Change how ReleaseTaggerBot handles major MW releases - https://phabricator.wikimedia.org/T113628#1673068 (10greg) [22:56:45] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #809: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/809/ [22:57:36] PROBLEM - Puppet failure on deployment-conf03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:04:45] 10Differential, 10Gerrit-Migration, 15User-greg: Write draft "business case" for migration from Gerrit to Differential - https://phabricator.wikimedia.org/T111250#1673106 (10greg) 5Open>3Resolved Calling this done. I just sent it to @Deskana to answer a question from him. That's a good indication it's in... [23:12:25] 10Gerrit-Migration, 6RelEng-Admin: Outline work (outcomes and outputs) of RelEng's Q2 Gerrit migration work - https://phabricator.wikimedia.org/T110623#1673128 (10greg) [23:17:27] 10Gerrit-Migration, 6RelEng-Admin: Outline work (outcomes and outputs) of RelEng's Q2 Gerrit migration work - https://phabricator.wikimedia.org/T110623#1673139 (10greg) @mmodell, @demon, @hashar: See the task description for context/current plan of record. Should our plan of action actually be: **Q2:** * de... [23:27:40] RECOVERY - Puppet failure on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:34:54] 10Gerrit-Migration, 15User-greg: Landing a patch with arc currently will sometimes strip author information - https://phabricator.wikimedia.org/T612#1673156 (10greg) [23:35:16] 10Gerrit-Migration, 15User-greg: Revise phabricator.wikimedia.org service level - https://phabricator.wikimedia.org/T76446#1673176 (10greg) [23:37:20] 10Deployment-Systems, 15User-greg: Trebuchet blockers for MediaWiki (tracking) - https://phabricator.wikimedia.org/T45338#1673203 (10greg) a:5greg>3None [23:37:30] 10Deployment-Systems: Trebuchet blockers for MediaWiki (tracking) - https://phabricator.wikimedia.org/T45338#457574 (10greg) [23:37:36] 10Beta-Cluster, 10Deployment-Systems: beta-update-databases-eqiad failed with exception in mwscript update.php - https://phabricator.wikimedia.org/T113065#1653919 (10greg) [23:37:54] 5Continuous-Integration-Scaling, 3releng-201516-q1: [keyresult] subset of jobs run in disposable instances - https://phabricator.wikimedia.org/T109914#1673214 (10greg) a:5greg>3None [23:54:44] 10Beta-Cluster, 6Collaboration-Team-Backlog, 10MediaWiki-Recent-changes: [enwiki-betalabs] Special: Recent changes page is not displayed correctly - https://phabricator.wikimedia.org/T113660#1673256 (10Mattflaschen) It looks some content has invalid interwiki links, so they're being converted to regular red... [23:58:02] 10Browser-Tests, 10Reading-Web, 5Patch-For-Review: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1673290 (10Jdlrobson) Any updates on this. We're getting spammed by emails and have no idea how to fix them as we don't know what's going on. [23:58:55] (03PS1) 10Krinkle: Remove 'jshint' job from EventLogging (alreayd has 'npm') [integration/config] - 10https://gerrit.wikimedia.org/r/240934 [23:59:06] (03CR) 10Krinkle: [C: 032] Remove 'jshint' job from EventLogging (alreayd has 'npm') [integration/config] - 10https://gerrit.wikimedia.org/r/240934 (owner: 10Krinkle) [23:59:39] (03Merged) 10jenkins-bot: Remove 'jshint' job from EventLogging (alreayd has 'npm') [integration/config] - 10https://gerrit.wikimedia.org/r/240934 (owner: 10Krinkle)