[02:40:42] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3787703 (10thcipriani) >>! In T179984#3784256, @akosiaris wrote: > I am guessing we can resolve this, but if you have any info about the version pu... [02:57:24] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3787716 (10thcipriani) [04:33:32] legoktm: is it a huge pain in the butt to create a new docker based test? I was going to try and figure out if there is a combo of gems that will work with the jessie ruby version [04:34:41] not really, it's usually just copying the jjb stuff into a dockerfile [04:35:28] and we have to do this anyways so :) [04:36:03] https://www.mediawiki.org/wiki/Continuous_integration/Docker [04:36:07] we will need stretch sooner or later [04:36:19] * bd808 reads the fine manual [04:48:53] I may have figured out the right gem versions to keep working with the current ruby... [04:52:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [05:00:19] legoktm: mischief managed. I bashed my head against the gem versions until the current job is happy [05:00:43] heh :D [05:07:48] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:46:30] Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #557: 04FAILURE in 2 hr 6 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/557/ [06:58:02] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:34:57] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<10.00%) [07:52:12] Krinkle (or anyone): any idea what could be done with https://travis-ci.org/wikimedia/jquery.uls/builds/302938757?utm_source=github_status&utm_medium=notification ? [08:28:17] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10monitoring: Icinga disk space alert when a Docker container is running on an host - https://phabricator.wikimedia.org/T178454#3787851 (10hashar) [08:29:03] 10Continuous-Integration-Infrastructure (shipyard), 10Cloud-Services, 10Operations, 10monitoring, and 3 others: Grafana reports ALL docker mounts in a spammy way - https://phabricator.wikimedia.org/T177052#3645705 (10hashar) [08:38:00] !log reactivating https://phabricator.wikimedia.org/source/iegreview/ , it still developped https://phabricator.wikimedia.org/D894 [08:38:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:44:36] (03PS2) 10Hashar: tox job for pywikibot/bots/CommonsDelinker [integration/config] - 10https://gerrit.wikimedia.org/r/393118 [08:45:43] (03CR) 10Hashar: [C: 032] tox job for pywikibot/bots/CommonsDelinker [integration/config] - 10https://gerrit.wikimedia.org/r/393118 (owner: 10Hashar) [08:46:46] (03Merged) 10jenkins-bot: tox job for pywikibot/bots/CommonsDelinker [integration/config] - 10https://gerrit.wikimedia.org/r/393118 (owner: 10Hashar) [08:54:12] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:48:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [10:12:54] 10Release-Engineering-Team (Watching / External), 10Operations, 10Release Pipeline: Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3788071 (10akosiaris) The `DH_GOLANG_EXCLUDES` seems to have worked. I was successfully able to build the package on stretch as well. [11:01:26] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, and 2 others: Icinga disk space alert when a Docker container is running on an host - https://phabricator.wikimedia.org/T178454#3788187 (10akosiaris) 05Open>03Resolved a:03akosiar... [11:17:52] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, and 2 others: Icinga disk space alert when a Docker container is running on an host - https://phabricator.wikimedia.org/T178454#3788200 (10hashar) /var/lib/docker sounds good enough fo... [11:24:59] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10monitoring: Icinga "configured eth" warning when a Docker container is running - https://phabricator.wikimedia.org/T181384#3788234 (10hashar) [11:30:50] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10monitoring: Icinga "configured eth" warning when a Docker container is running - https://phabricator.wikimedia.org/T181384#3788258 (10hashar) a:03akosiaris Fixed by @akosiaris with https://gerrit.wiki... [11:34:23] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10monitoring: Icinga "configured eth" warning when a Docker container is running - https://phabricator.wikimedia.org/T181384#3788261 (10hashar) 05Open>03Resolved I ran puppet on contint1001, got a con... [11:44:09] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788300 (10zeljkofilipin) To reproduce: ``` $ composer update && npm i && npx wdio tests/selenium/wdio.c... [11:47:18] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788305 (10zeljkofilipin) Last known good commit (d1439a3e6746): ``` $ git checkout d1439a3e6746 $ comp... [11:51:02] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788323 (10zeljkofilipin) ``` $ git bisect start $ git bisect bad master $ git bisect good d1439a3e6746 B... [11:54:56] 10Continuous-Integration-Config, 10Pywikibot-core, 10Patch-For-Review: Jenkins output for pywikibot job is hard to read - https://phabricator.wikimedia.org/T117570#1777974 (10fgiunchedi) I came across console output that for some reason hasn't the ascii escape sequences interpreted, making the output even ha... [12:51:48] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788513 (10zeljkofilipin) 31f87d19e56a is just before the problem was introduced. ``` $ git checkout 31f... [12:55:32] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788529 (10zeljkofilipin) ``` $ git checkout 6b2f13b055c8 Previous HEAD position was 31f87d19e5... object... [12:59:37] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788531 (10zeljkofilipin) @aaron @Krinkle looks like 6b2f13b055c8 ([[ https://gerrit.wikimedia.org/r/#/c/3... [12:59:39] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3788533 (10zeljkofilipin) [12:59:42] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3744406 (10zeljkofilipin) a:05zeljkofilipin>03None [15:44:12] 10Gerrit, 10Release-Engineering-Team (Someday), 10Patch-For-Review: Update gerrit to 2.14.6 - https://phabricator.wikimedia.org/T156120#3788970 (10Paladox) [16:29:57] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Use createAccount() in Selenium tests - https://phabricator.wikimedia.org/T180379#3789217 (10zeljkofilipin) a:03zeljkofilipin [16:44:21] 10Continuous-Integration-Config, 10MinervaNeue, 10RelatedArticles, 10Readers-Web-Backlog (Tracking): "Minerva NotificationBadge" JS test causing unrelated CI failures on other repositories - https://phabricator.wikimedia.org/T181348#3789253 (10Jdlrobson) I suspect there's something wrong with the config se... [16:44:24] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, and 2 others: Icinga disk space alert when a Docker container is running on an host - https://phabricator.wikimedia.org/T178454#3789255 (10Dzahn) Is https://gerrit.wikimedia.org/r/#/c/... [16:47:29] 10Release-Engineering-Team (Kanban), 10Collaboration-Team-Triage, 10Notifications, 10Patch-For-Review, 10User-zeljkofilipin: Mocha tests for Echo notifications - https://phabricator.wikimedia.org/T177412#3789272 (10zeljkofilipin) a:03Etonkovidova [16:48:21] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3789273 (10debt) Sweet! Per @greg's note above, the (mostly) automated process for the Wikimedia Portals Update will run every Monday at 11:0... [16:50:00] 10Release-Engineering-Team (Kanban), 10Discovery, 10Discovery-Search (Current work), 10MW-1.31-release-notes (WMF-deploy-2017-10-24 (1.31.0-wmf.5)), 10Patch-For-Review: [Epic] Port Selenium tests from Ruby to Node.js for the Search Platform - https://phabricator.wikimedia.org/T174103#3789285 (10debt) Ha... [16:52:19] 10MediaWiki-Releasing, 10Security: Release MediaWiki 1.27.4/1.28.3/1.29.2 - https://phabricator.wikimedia.org/T180272#3789291 (10Reedy) [16:52:21] 10MediaWiki-Releasing, 10Patch-For-Review, 10Security: Update HISTORY in master after 1.27.4/1.28.3/1.29.2 - https://phabricator.wikimedia.org/T180276#3789289 (10Reedy) 05Open>03Resolved a:03Reedy [17:52:26] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10scap2: scap3 should repack / pack-refs git repos under /srv/deployment - https://phabricator.wikimedia.org/T112509#3789493 (10mmodell) a:03mmodell this is pretty easy I will try to get this in before the end of quarter. [17:52:57] 10Gerrit-Migration, 10Differential, 10Phabricator: Phabricator does not provide an API to get Differential transaction data, similar to maniphest.gettasktransactions - https://phabricator.wikimedia.org/T123416#3789499 (10mmodell) [17:53:03] 10Gerrit-Migration, 10Release-Engineering-Team (Kanban), 10Differential, 10Phabricator, and 2 others: Create conduit method to query the feed and return records with relevant details populated instead of just a bunch of phids - https://phabricator.wikimedia.org/T123417#3789496 (10mmodell) 05Open>03stall... [17:55:30] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2): Scap failing to rewrite submodule urls in beta - https://phabricator.wikimedia.org/T179013#3789509 (10mmodell) Still waiting on confirmation but I believe this is fixed. [17:56:40] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2): Scap failing to rewrite submodule urls in beta - https://phabricator.wikimedia.org/T179013#3710215 (10greg) >>! In T179013#3738075, @mmodell wrote: > @awight: Can you confirm whether this is resolved now that rMSCA1c8be017 is on beta?... [17:57:18] 10Release-Engineering-Team (Kanban), 10Phabricator, 10Patch-For-Review: Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810#3789512 (10mmodell) p:05Normal>03Low This is probably going on the back burner while I work on #scap-techdebt-2017-q2 for the remainder of the quarter. [17:58:03] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Deployments, 10WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3789514 (10mmodell) No update, this should go out to production with the next scap build. [18:00:48] 10Release-Engineering-Team (Kanban), 10User-greg: Get sunsetting doc feedback from CE - https://phabricator.wikimedia.org/T180620#3789537 (10greg) 05Open>03Resolved [18:03:20] twentyafterfour hi, how do i make differential changes without it being a draft please? [18:03:21] https://phabricator.wikimedia.org/D897 [18:10:26] (03PS17) 10Paladox: Phabricator/harbormaster job templates [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) (owner: 1020after4) [18:28:04] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3789703 (10Krinkle) [18:29:22] twentyafterfour i found if there is no elasticsearch index then the stats page throws an error [18:29:26] ie https://phab.wmflabs.org/config/cluster/search/ [18:47:43] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:53:48] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:55:43] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:02:43] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:02:59] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:06:19] !log Update beta ORES to latest, e58bfbf [19:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:17:55] https://phabricator.wikimedia.org/P6380 [19:22:43] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:33:49] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:44] RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:44] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:59] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:32] 10Continuous-Integration-Config: QUnit tests are not running on Minerva skin - https://phabricator.wikimedia.org/T181429#3789942 (10Jdlrobson) [19:46:36] (03PS1) 10Jdlrobson: Run QUnit tests on Minerva skin non-experimental [integration/config] - 10https://gerrit.wikimedia.org/r/393642 (https://phabricator.wikimedia.org/T181429) [19:47:03] 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Jdlrobson: QUnit tests are not running on Minerva skin - https://phabricator.wikimedia.org/T181429#3789960 (10Jdlrobson) [20:10:52] [20:09:32] holychrist docker [20:10:52] [20:10:12] every time i use a new version, i encounter almost immediately some wacko bug that a bunch of other people have encountered that is solved by installing some random near-but-different version [20:56:33] Yippee, build fixed! [20:56:34] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #217: 09FIXED in 18 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/217/ [20:56:35] twentyafterfour i think i found a bug with setting the elasticsearch version. [20:56:54] it was not noticable as we were on elasticsearch 5 which we set as the default. [21:05:00] ah never mind [21:05:08] we have been setting the version field wrong [21:07:05] Yippee, build fixed! [21:07:06] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #217: 09FIXED in 28 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/217/ [21:12:38] Hmm [21:13:02] codefw should not be working with phabricator as we set the port outside of hosts though the docs says you have to do it inside hosts [21:14:16] and eqiad [21:20:37] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3744406 (10Imarlier) a:03aaron [21:29:02] Can anyone confirm that Special:Version still shows incorrect versions? Is there a way to purge the cache? [21:29:30] Yes for the first [21:29:34] Full scap... I think for the second [21:29:50] Ah ty for the info! [21:30:24] scap does some mw git cache info thing [21:31:45] the git hashes are only updated by a full scap. SWAT patches do not change them [21:32:12] there's an old feature request for this somewhere in phab [21:45:23] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team: Need to make the number of cached revisions configurable - https://phabricator.wikimedia.org/T181176#3790492 (10mmodell) p:05Triage>03High a:03mmodell [21:45:37] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Scoring-platform-team: Need to make the number of cached revisions configurable - https://phabricator.wikimedia.org/T181176#3790495 (10mmodell) [21:45:50] greg-g: Regarding ORES re-enabling after the bugfix, It seems it isn't covered by tests. I haven't checked if other changes were made since then in other commits, but seems like we might want to re-consider re-enabling given similar issues can cause the same downtime (which happened several weeks in a row), and even the current fix doesn't seem to be covered by tests. [21:45:58] Just 2c, but letting you know :) [21:47:06] awight: ^^ [21:47:34] Krinkle: I agree about tests, except “several weeks in a row” is not correct, this was an isolated incident. https://wikitech.wikimedia.org/wiki/Incident_documentation/20171120-Ext:ORES [21:48:53] Krinkle: Thank you for noticing—and sorry for terseness, I’m just mid-deploy. FYI the key patch to re-enable is https://gerrit.wikimedia.org/r/393667 and all are welcome to revert that if we have more trouble. [21:49:29] awight: I see, I haven't yet re-read the expanded incident reports, will do :) - from the outside perspective, however, I did not see a difference between the two issues, but I'll look at it, thanks. [21:51:08] Krinkle: The Oct 24-ish issue you’re thinking of is still unsolved, but it’s pretty certain that ORES happened to be showcased but wasn’t a cause, see https://phabricator.wikimedia.org/T179156#3782508 for example [21:54:36] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Scoring-platform-team: Need to make the number of cached revisions configurable - https://phabricator.wikimedia.org/T181176#3790568 (10mmodell) Are you really seeing 10 old versions cached? As far as I can tell scap should only be... [22:00:01] awight: Good to know. Thanks. For today (re-enabling) I was just wondering how much make sense to defend against before re-enabling. I don't think we should block it on a major refactor, but there were quite a few actionables that came out of last week. Wondering if any of those are fair to have in production going forward, with regards to continuous risk going forward. Makes everyone else feel a bit more comfortable and safe. [22:00:10] I see most of the actionables have been started on so that's pretty cool to see :) [22:00:40] Krinkle: I’m going ahead with the reenablement but take your concerns seriously. One moment please... [22:03:20] Krinkle: k, reenabled. The precautions I took today were to stage the components like: * disable ORES on risky wikis by config, * deploy Ext:ORES error-handling fixes, * deploy the ORES service, then * reenable ORES on risky wikis [22:03:46] By my reasoning, it would have been possible to re-disable by just the <60s config revert. [22:04:13] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Scoring-platform-team: Need to make the number of cached revisions configurable - https://phabricator.wikimedia.org/T181176#3790606 (10mmodell) [22:07:19] awight: Okay [22:07:27] awight: The ability to disable by config is new? [22:07:36] No, it’s been there all along. [22:07:42] Right, okay. [22:07:50] Definitely the sort of thing we try not to do unless there’s a really big fire, though. [22:08:45] Last week I rolled back, but screwed it up a few times (explained in the incident report), plus wasn’t aware of the issue for about 1h30m because I hadn’t thought to monitor the client side... [22:09:55] All around pretty embarrassing for me, but I’m glad to learn. And resigned to learning the hard way ;-) [22:33:04] 10Gerrit, 10Operations, 10Ops-Access-Requests: Requesting access to the ldap nda group - https://phabricator.wikimedia.org/T181446#3790698 (10Dzahn) [22:36:58] 10Gerrit, 10Operations, 10Ops-Access-Requests: Requesting access to the ldap nda group - https://phabricator.wikimedia.org/T181446#3790708 (10Paladox) [22:38:26] 10Gerrit, 10Operations, 10Ops-Access-Requests: Access to logstash (LDAP group 'nda') for Paladox - https://phabricator.wikimedia.org/T181446#3790711 (10Dzahn) [22:39:55] 10Release-Engineering-Team (Kanban), 10MediaWiki-Cache, 10MediaWiki-Vagrant, 10Performance-Team, 10User-zeljkofilipin: MediaWiki core Selenium tests fail when targeting Vagrant - https://phabricator.wikimedia.org/T180035#3790712 (10hashar) When saving the user preferences, the return page is generated wi... [22:55:02] 10Gerrit, 10Operations, 10Ops-Access-Requests: Access to logstash (LDAP group 'nda') for Paladox - https://phabricator.wikimedia.org/T181446#3790664 (10faidon) LDAP NDA access effectively means getting access to private and sensitive information, on multiple servers and services, across the board. As such, i...