[07:25:00] (03CR) 10Hashar: [C: 032] Whitelist Foxy brown [integration/config] - 10https://gerrit.wikimedia.org/r/355862 (owner: 10Mattflaschen) [07:28:17] (03Merged) 10jenkins-bot: Whitelist Foxy brown [integration/config] - 10https://gerrit.wikimedia.org/r/355862 (owner: 10Mattflaschen) [07:38:02] 10Continuous-Integration-Config, 10MediaWiki-extensions-LabeledSectionTransclusion: Load /tests/parser/ParserTestParserHook.php when testing extensions - https://phabricator.wikimedia.org/T166480#3297539 (10hashar) [07:53:36] 10Continuous-Integration-Config, 10MediaWiki-extensions-LabeledSectionTransclusion: Load /tests/parser/ParserTestParserHook.php when testing extensions - https://phabricator.wikimedia.org/T166480#3297549 (10hashar) I can not reproduce locally :-( ``` $ php tests/phpunit/phpunit.php --testsuite extensions --fi... [08:42:56] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:42:59] 10Continuous-Integration-Config, 10MediaWiki-extensions-LabeledSectionTransclusion, 13Patch-For-Review: Load /tests/parser/ParserTestParserHook.php when testing extensions - https://phabricator.wikimedia.org/T166480#3297616 (10hashar) I have found the issue! The patch relies on the **DOM** parser preprocess... [08:55:57] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:18:39] (03PS2) 10Hashar: [WikimediaMaintenance] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/355803 (owner: 10Umherirrender) [09:18:57] (03CR) 10Hashar: [C: 032] [WikimediaMaintenance] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/355803 (owner: 10Umherirrender) [09:20:45] (03Merged) 10jenkins-bot: [WikimediaMaintenance] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/355803 (owner: 10Umherirrender) [09:22:54] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [09:29:38] 10Continuous-Integration-Config, 06Release-Engineering-Team (Kanban), 06Operations, 13Patch-For-Review, and 2 others: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#3297715 (10zeljkofilipin) [09:30:57] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:50:57] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492#3297760 (10hashar) [09:51:44] !log deployment-tin: rm /var/lib/l10nupdate/caches/cache-master/*.json T166492 [09:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:51:48] T166492: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492 [09:59:48] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492#3297774 (10hashar) @thcipriani you have clones of mediawiki repositories on deployment-tin in `/home/thcipriani/mwclonetest`. I guess that can be safely deleted. [10:06:41] !log deployment-tin rm -fR /usr/src/hhvm T166492 [10:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:06:43] T166492: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492 [10:11:44] the etckeeper phase after a puppet run in labs takes forever, sigh [10:11:49] Notice: Finished catalog run in 11.15 seconds [10:11:53] and then wait [10:14:33] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 07Documentation: write up "making sense of Jenkins browser test results" - https://phabricator.wikimedia.org/T87225#3297827 (10zeljkofilipin) 05Open>03declined p:05Normal>03Low No activity in a couple of years. Please reopen if you... [10:18:00] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 07Documentation: Document how to debug Selenium tests - https://phabricator.wikimedia.org/T50216#3297834 (10zeljkofilipin) 05Open>03declined No activity in years. These days, anybody debugging a failed test already knows what is going on. [10:20:47] 06Release-Engineering-Team (Backlog), 05Testing-Initiative, 07Tracking: Follow up workshop & brown bag ideas from Testing: Where does it hurt? (tracking) - https://phabricator.wikimedia.org/T108122#3297843 (10zeljkofilipin) [10:20:49] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban), 07Documentation, 05Testing-Initiative: Improve browser testing page with templates : Emphasize testing documentation on mediawiki.org - https://phabricator.wikimedia.org/T108110#3297840 (10zeljkofilipin) 05Open>03Resolved a:03zeljkof... [10:22:20] !log force refreshed Nodepool Trusty images. Was stuck somehow [10:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:23:31] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): selenium fails to connect to firefox (headless not sauce) - https://phabricator.wikimedia.org/T117561#1777674 (10zeljkofilipin) Is this still a problem? Can this be resolved? [10:25:07] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Could not find link to Sauce Labs job URL - https://phabricator.wikimedia.org/T165487#3297852 (10zeljkofilipin) p:05Triage>03Low [10:25:52] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 06Reading-Web-Backlog, 07Ruby, 15User-zeljkofilipin: Run subset of MobileFrontend browser tests on merges in core - https://phabricator.wikimedia.org/T165940#3297853 (10zeljkofilipin) p:05Triage>03Low [10:25:57] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 15User-zeljkofilipin: Migration of browsertests* Jenkins jobs to selenium* jobs cleanup and optional task - https://phabricator.wikimedia.org/T140235#3297854 (10zeljkofilipin) p:05Triage>03Low [10:26:01] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Implement a smoke + parallel strategy for running end-to-end tests - https://phabricator.wikimedia.org/T130037#3297855 (10zeljkofilipin) p:05Triage>03Low [10:27:13] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 07Documentation: Improve mediawiki_api documentation with inline yard - https://phabricator.wikimedia.org/T102726#3297871 (10zeljkofilipin) p:05Normal>03Low [10:27:24] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 13Patch-For-Review, 07Ruby, 15User-zeljkofilipin: Upgrade Cucumber from version 1 to version 3 - https://phabricator.wikimedia.org/T160086#3297872 (10zeljkofilipin) p:05Normal>03Low [10:27:33] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban), 07Upstream, 07WorkType-NewFunctionality: JJB should support YAML axis - https://phabricator.wikimedia.org/T128462#3297873 (10zeljkofilipin) p:05Normal>03Low [10:28:40] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog): Browser test jobs should use xUnit publisher instead of Junit - https://phabricator.wikimedia.org/T94684#3297876 (10zeljkofilipin) p:05Normal>03Low [10:28:54] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#3297877 (10zeljkofilipin) p:05Normal>03Low [10:29:06] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 13Patch-For-Review, 07Ruby, and 2 others: Auto retry failed browser tests to reduce false negatives - https://phabricator.wikimedia.org/T67773#3297878 (10zeljkofilipin) p:05Normal>03Low [10:29:16] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Investigate distribution of browser test run time - https://phabricator.wikimedia.org/T104396#3297879 (10zeljkofilipin) p:05Normal>03Low [10:29:23] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Run browser tests against the nightly build version on Beta Cluster - https://phabricator.wikimedia.org/T67128#3297880 (10zeljkofilipin) p:05Normal>03Low [10:29:35] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10CirrusSearch, 06Discovery-Search: Make browsertests for CirrusSearch run on every submitted patch with proper CI infrastructure rather than a bot - https://phabricator.wikimedia.org/T98374#3297881 (10z... [10:29:55] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): jenkins doesn't show the real failed tests, but raita does - https://phabricator.wikimedia.org/T116162#3297882 (10zeljkofilipin) p:05Normal>03Low [10:30:06] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): browsertest failure reports don't show the failing tests saucelabs link, but a different one - https://phabricator.wikimedia.org/T115500#3297883 (10zeljkofilipin) p:05Normal>03Low [10:30:17] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10Wikidata: Add email notification for aborted wikidata browser tests jobs - https://phabricator.wikimedia.org/T128067#3297884 (10zeljkofilipin) p:05Normal>03Low [10:30:24] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10Wikidata, 15User-zeljkofilipin: Trigger run of special set of browsertests on gerrit with a keyword - https://phabricator.wikimedia.org/T145190#3297885 (10zeljkofilipin) p:05Normal>03Low [10:30:34] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 06Reading-Web-Backlog, 07Jenkins, and 2 others: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#3297886 (10zeljkofilipin) p:05Normal>03Low [10:30:43] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team (Backlog), 10MediaWiki-extensions-GettingStarted, and 3 others: Missing XML files cause "Publish Performance test result report" - https://phabricator.wikimedia.org/T164296#3297887 (10zeljkofilipin) p:05Nor... [10:30:50] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 15User-zeljkofilipin: There should be a way to run custom Rake task in selenium* jobs - https://phabricator.wikimedia.org/T133542#3297888 (10zeljkofilipin) p:05Normal>03Low [10:30:58] 10Browser-Tests-Infrastructure, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Display and/or announce build status of wmf branch cut tests (including @integration tests) - https://phabricator.wikimedia.org/T111823#3297889 (10zeljkofilipin) p:05Normal>03Low [10:31:10] 10Browser-Tests-Infrastructure, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Run @integration tests on new deploy branch creation - https://phabricator.wikimedia.org/T111545#3297890 (10zeljkofilipin) p:05Normal>03Low [10:31:18] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): When beta cluster is down Jenkins jobs should be aborted and not trigger e-mail notifications - https://phabricator.wikimedia.org/T101563#3297892 (10zeljkofilipin) p:05Normal>03Low [10:33:27] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog): Browser test jobs should use xUnit publisher instead of Junit - https://phabricator.wikimedia.org/T94684#1170069 (10zeljkofilipin) @hashar do you still think this is something that needs to be done, or can... [10:39:47] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#3297915 (10zeljkofilipin) [10:39:49] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban), 07Epic, 13Patch-For-Review, 07Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#3297912 (10zeljkofilipin) 05Open>03Resolved a:03zeljkofilipin As far as I know this is resolved... [10:41:45] zeljkof: I am merging the patch that adds saucelabs support to webdriver.io :) https://gerrit.wikimedia.org/r/#/c/345824/ [10:42:05] hashar: great! [10:42:25] that was the other thing I wanted to talk to you about, but I guess I have missed it in gerrit [10:42:27] :) [10:43:24] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban), 07Epic, 13Patch-For-Review, 07Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#3297925 (10zeljkofilipin) [10:43:26] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10CirrusSearch, 06Discovery-Search: Make browsertests for CirrusSearch run on every submitted patch with proper CI infrastructure rather than a bot - https://phabricator.wikimedia.org/T98374#3297921 (10z... [10:44:13] hashar: what is the name for "containers all the way" project? [10:44:22] can not remember... [10:44:31] is there a project in phab? [10:44:34] * zeljkof is looking [10:46:05] can not find it [10:47:39] ah [10:47:40] SSD / Pipeline Planning [10:47:56] https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Offsites/2017-05-Vienna#SSD_.2F_Pipeline_Planning [10:51:23] zeljkof: https://phabricator.wikimedia.org/tag/release_pipeline/ [10:51:41] containers being the envisioned tech to implement it [10:52:59] hashar: thanks! [11:02:07] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#3297978 (10zeljkofilipin) @greg do we still need this task or is now replaced by [[ https://phabricator.wikimedia.org/tag/release_pipelin... [11:04:09] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Accommodate flaky tests flapping - https://phabricator.wikimedia.org/T94212#3297980 (10zeljkofilipin) @greg no activity in years, should this be closed? [11:06:41] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 13Patch-For-Review, 07Ruby, and 2 others: Auto retry failed browser tests to reduce false negatives - https://phabricator.wikimedia.org/T67773#3297986 (10zeljkofilipin) [11:06:44] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Next), 10Wikidata, 13Patch-For-Review, and 3 others: Increase in failures caused by Saucelabs - https://phabricator.wikimedia.org/T152963#3297989 (10zeljkofilipin) [11:08:07] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Investigate distribution of browser test run time - https://phabricator.wikimedia.org/T104396#3297990 (10zeljkofilipin) @greg can this be resolved? Is this something that you still want done? [11:09:00] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Run browser tests against the nightly build version on Beta Cluster - https://phabricator.wikimedia.org/T67128#694146 (10zeljkofilipin) @greg replaced by [[ https://phabricator.wikimedia.org/tag/release_pipeline/ | release pipeline ]]? [11:10:15] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Implement a smoke + parallel strategy for running end-to-end tests - https://phabricator.wikimedia.org/T130037#3297993 (10zeljkofilipin) @dduvall is this task replaced by [[ https://phabricator.wikimedia.org/tag/release_pipeline/ | release p... [11:11:27] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban): jenkins doesn't show the real failed tests, but raita does - https://phabricator.wikimedia.org/T116162#3298012 (10zeljkofilipin) 05Open>03Resolved a:03zeljkofilipin I do not remember seeing this problem in years. Please reopen if this i... [11:12:19] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Kanban): browsertest failure reports don't show the failing tests saucelabs link, but a different one - https://phabricator.wikimedia.org/T115500#3298017 (10zeljkofilipin) 05Open>03Resolved a:03zeljkofilipin Probably fixed years ago. Please reop... [11:15:32] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10Wikidata: Add email notification for aborted wikidata browser tests jobs - https://phabricator.wikimedia.org/T128067#3298036 (10zeljkofilipin) 05Open>03declined Unlikely to ever happen because of {T... [11:16:22] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 06Release-Engineering-Team (Backlog), 10Wikidata, 15User-zeljkofilipin: Trigger run of special set of browsertests on gerrit with a keyword - https://phabricator.wikimedia.org/T145190#3298045 (10zeljkofilipin) @hashar can this be done? Or s... [11:18:44] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 07Jenkins, 15User-zeljkofilipin: Browser test Jenkins videos do not always play in-browser - https://phabricator.wikimedia.org/T155794#3298066 (10zeljkofilipin) @hashar should we just decline this, since there is an easy workaround (down... [11:22:13] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Could not find link to Sauce Labs job URL - https://phabricator.wikimedia.org/T165487#3267315 (10zeljkofilipin) I am working on re-running the tests as part of {T152963}. [11:22:46] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Could not find link to Sauce Labs job URL - https://phabricator.wikimedia.org/T165487#3298074 (10zeljkofilipin) [11:22:50] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Next), 10Wikidata, 13Patch-For-Review, and 3 others: Increase in failures caused by Saucelabs - https://phabricator.wikimedia.org/T152963#2864860 (10zeljkofilipin) [11:23:52] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team (Backlog), 10MediaWiki-extensions-GettingStarted, and 3 others: Missing XML files cause "Publish Performance test result report" - https://phabricator.wikimedia.org/T164296#3228948 (10zeljkofilipin) @hashar... [11:24:46] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 15User-zeljkofilipin: Migration of browsertests* Jenkins jobs to selenium* jobs cleanup and optional task - https://phabricator.wikimedia.org/T140235#3298083 (10zeljkofilipin) [11:24:48] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 15User-zeljkofilipin: There should be a way to run custom Rake task in selenium* jobs - https://phabricator.wikimedia.org/T133542#3298079 (10zeljkofilipin) 05Open>03declined Unlikely to ever happen because of {T139740}. [11:25:40] 10Browser-Tests-Infrastructure, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Display and/or announce build status of wmf branch cut tests (including @integration tests) - https://phabricator.wikimedia.org/T111823#1617125 (10zeljkofilipin) @greg replaced by [[ https://phabricator.wikimedia.org/... [11:26:28] 10Browser-Tests-Infrastructure, 10Deployment-Systems, 06Release-Engineering-Team (Backlog): Run @integration tests on new deploy branch creation - https://phabricator.wikimedia.org/T111545#1607160 (10zeljkofilipin) @greg replaced by [[ https://phabricator.wikimedia.org/tag/release_pipeline/ | release pipelin... [11:29:30] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Extend cucumber pretty formatter to include links to sauce labs jobs - https://phabricator.wikimedia.org/T72608#3298118 (10zeljkofilipin) [11:29:32] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Passed Jenkins jobs should have links to Sauce Labs jobs - https://phabricator.wikimedia.org/T48890#3298120 (10zeljkofilipin) [11:30:53] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog): Extend cucumber pretty formatter to include links to sauce labs jobs - https://phabricator.wikimedia.org/T72608#749457 (10zeljkofilipin) 05Open>03declined Unlikely to ever happen because of {T139740}. [11:31:38] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 05Testing-Initiative: Run Selenium tests in parallel - https://phabricator.wikimedia.org/T57867#3298133 (10zeljkofilipin) 05Open>03declined Unlikely to ever happen because of {T139740}. [11:32:29] 06Release-Engineering-Team (Kanban), 06Performance-Team, 10Phabricator: Give performance team members right privileges to write posts on performance phame blog - https://phabricator.wikimedia.org/T166443#3298139 (10Gilles) a:03mmodell [11:35:11] 06Release-Engineering-Team (Kanban), 06Performance-Team, 10Phabricator: Give performance team members right privileges to write posts on performance phame blog - https://phabricator.wikimedia.org/T166443#3298141 (10Gilles) The permission error happens despite the fact that the blog is configured to be editab... [11:42:45] 06Release-Engineering-Team (Kanban), 06Performance-Team, 10Phabricator: Give performance team members right privileges to write posts on performance phame blog - https://phabricator.wikimedia.org/T166443#3298153 (10Aklapper) Side note: The "Editable By" restriction currently does not make any sense as anyone... [11:44:51] 06Release-Engineering-Team (Kanban), 10Wikimedia-Hackathon-2017: Building Better Software (Hack-a-thon session) - https://phabricator.wikimedia.org/T165729#3298172 (10zeljkofilipin) Notes from the session. About 10-15 people total, as far as I could tell about half or them from WMF. jr: an general overview o... [12:04:41] (03CR) 10Hashar: [C: 031] Update for CodeSniffer 3.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/355067 (https://phabricator.wikimedia.org/T142474) (owner: 10Legoktm) [12:08:33] 06Release-Engineering-Team (Kanban), 06Performance-Team, 10Phabricator: Give performance team members right privileges to write posts on performance phame blog - https://phabricator.wikimedia.org/T166443#3298197 (10Gilles) We can "lock" our group, though, can't we? [12:10:35] gilles: hello. For Phabricator ACL I think we usually go with a specific group. Eg acl*releng https://phabricator.wikimedia.org/project/view/1615/ [12:13:29] hashar: is that different than a regular group, or is it just a naming convention? [12:25:35] gilles: I would say it is a convention [12:25:41] with the regular groups being joinable by anyone [12:36:42] 10Continuous-Integration-Infrastructure, 07Zuul: Find a way to deduplicate post-merge builds like mediawiki-core-doxygen-publish - https://phabricator.wikimedia.org/T94715#3298243 (10hashar) [12:36:44] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 13Patch-For-Review: Rewrite mediawiki-core-doxygen-publish Jenkins job to poll scm instead of being triggered by Zuul - https://phabricator.wikimedia.org/T115755#3298242 (10hashar) [12:44:07] now that you can also just watch a project instead of becoming a member, setting up an ACL project might be mood. But a topic for #wikimedia-devtools [12:50:22] 06Release-Engineering-Team, 06Performance-Team, 10Phabricator: Custom domain/URL for phame performance blog - https://phabricator.wikimedia.org/T166374#3298254 (10Gilles) p:05Triage>03Low [13:07:16] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 06Release-Engineering-Team (Kanban), 13Patch-For-Review: For MediaWiki extensions, merge composer test into mwext-textextension / mediawiki-extensions jobs - https://phabricator.wikimedia.org/T161895#3298262 (10hashar) 05Resolved>03Open I hav... [13:10:11] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:12:13] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:15:33] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:19:08] 10Continuous-Integration-Config, 10MediaWiki-extensions-LabeledSectionTransclusion, 13Patch-For-Review: Load /tests/parser/ParserTestParserHook.php when testing extensions - https://phabricator.wikimedia.org/T166480#3298284 (10Sophivorus) That piece of code is rather delicate (it took me quite a while to cra... [13:27:30] hashar: around for puppet question? [13:27:37] (copy pasting my terminal output) [13:27:38] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492#3298304 (10thcipriani) >>! In T166492#3297774, @hashar wrote: > @thcipriani you have clones of mediawiki repositories on deployment-tin in `/home/thcipriani/mwclonetest`.... [13:27:51] zeljkof: sure [13:28:13] hashar: https://phabricator.wikimedia.org/P5497 [13:28:22] trouble installing gems in ops/puppet [13:29:01] hm, log [13:29:05] oops [13:29:40] well, removing Gemfile.lock fixed the problem :D [13:30:05] looks like I had it in my repo from before, but it is no longer in git [13:32:23] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:32:27] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:32:28] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492#3298310 (10hashar) `/` looks better now, thank you! ``` $ df -h / /srv Filesystem Size Used Avail Use% Mounted on /dev/vda3... [13:32:31] hashar: nevermind, works now, some specs are failing [13:41:14] hashar: o/ [13:43:30] elukey: hello :) [13:44:04] hello :) I was wondering if you had the chance to look to my ramblings for the Redis connect timeouts [13:51:40] elukey: in a hurry yes. [13:51:58] ah okok no worries then [13:53:52] elukey: I am still confused as to what is the cause of the socket timeout :D [13:56:07] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 07Documentation, 07Easy, and 3 others: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3298320 (10Rammanojpotla) Is there any issue with the ruby code in adding the license to it? [13:57:46] hashar: I am not sure too, didn't find any indication about the root cause. I am trying to reduce the noise and see what wil happen. My current theory is that jobrunners periodically send a lot of RunJobs.php requests to HHVM, and each one of them opens TCP sockets to the Redis shards. [13:57:51] This causes two issues [13:58:03] 1) on the jobrunners, tons of connections that are not reused and end up in TIME-WAIt [13:58:21] 2) bursts of connection requests to Redis that happen periodically [13:58:43] I am confident that this issue is a big part of the connect timeout problem [14:00:55] elukey: the time wait I think it is just a consequence and is not a big problem [14:01:06] the burst of tcp connections to the redis server might be the issue though [14:01:27] I have no clue how many new tcp connections / sec linux/redis can handle [14:01:57] the time-waits are a problem for the jobrunners, we avoided to run out of local ports only because the kernel recycles TCP sockets before the 2 min of time-wait [14:02:41] I think it is more how many Redis can accept under load without hitting the connect timeout that we set on jobrunners [14:02:47] (03PS1) 10Hashar: Run composer test from mediawiki-extensions jobs [integration/config] - 10https://gerrit.wikimedia.org/r/356047 (https://phabricator.wikimedia.org/T161895) [14:03:40] (03CR) 10Hashar: "Half complete. The macro composer-test-mwextension would still trigger for mediawiki/core or mediawiki/vendor and cd $EXT_NAME when it is " [integration/config] - 10https://gerrit.wikimedia.org/r/356047 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [14:03:45] (03CR) 10Hashar: [C: 04-1] Run composer test from mediawiki-extensions jobs [integration/config] - 10https://gerrit.wikimedia.org/r/356047 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [14:04:16] elukey: so potentially the issue would be on the redis server themselves? [14:04:56] hashar: I think so but I haven't found any good trace to support my thesis [14:09:31] hashar: now what I am trying to figure out is if hhvm is able to re-use TCP connections among multiple threads and calls to RunJobs.php [14:09:46] but I am not that confident [14:12:08] elukey: have you confirmed the redis server has a large enough maxclients and high enough file descriptors limit ? [14:12:24] sorry if I loft track of what has been done so far [14:16:42] not a problem of file descriptors [14:16:51] and I think maxclients is fine IIRC [14:17:39] re-checking the settings [14:23:03] elukey: well reaching maxclients would probably raise a "human" friendly error [14:23:10] some doc suggest 'max number of clients reached' [14:23:11] bah [14:23:28] theoretically yes [14:23:55] also found: modules/redis/files/redis-jessie.conf:tcp-backlog 511 [14:24:17] no clue what it is, but it /proc/sys/net/core/somaxconn should be at least that vlaue [14:24:46] who knows what TCP backlog can be :D [14:24:50] the tcp backlog is how many "queued" tcp requests can be held before returng errors [14:25:12] could it be that sometime the queue is filled? [14:25:27] https://httpd.apache.org/docs/2.4/mod/mpm_common.html#listenbacklog [14:25:29] for example [14:25:40] I checked the TCP metrics related to that, didn't find anything [14:25:51] (listenoverflowdrops etc..) [14:26:01] it is also available in netstat [14:26:09] ah yeah the diamond servers.xxx.tcp.* metrics [14:29:37] PROBLEM - Host deployment-phab02 is DOWN: CRITICAL - Host Unreachable (10.68.19.232) [14:36:36] !log set redis-cli -a "$(sudo grep -Po '(?<=masterauth ).*' /etc/redis/tcp_6379.conf)" -p 6381 config set tcp-keepalive 300 on redis01 as test (rollback: redis-cli -a "$(sudo grep -Po '(?<=masterauth ).*' /etc/redis/tcp_6379.conf)" -p 6381 config set tcp-keepalive 0) [14:36:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:37:10] then I ma not sure how to even reproduce it on beta :( [14:37:38] well I am pretty sure that if the jobrunners reuse tcp connes the problem will go away [14:37:52] :] [14:38:00] but atm I can't manage to keep a conn open after RunJobs.php finishes [14:38:11] the jobrunner sents a fin to redis [14:44:52] !log reverted previous config on redis01 [14:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:19:38] elukey: well I have dig a bit in the hhvm code / linux socket etc [16:20:01] elukey: and I found a lead in mediawiki to reduce the number of connections made to the redis db :] [16:20:20] will craft some patch tonight [16:37:33] ahahhaha [16:37:50] can I have a preview now? Super curious [16:38:11] my next step was to figure out if we explicitly close() the redis conn somewhere [16:38:44] because In the RedisConnectionPool()'s desctructor we do the close [16:39:20] so I think that when RunJobs.php finishes, then RedisConnectionPool()'s desctructor kicks in and the connection is torn down [16:43:06] mmm this might really be why persistent is not going to work [16:43:12] sigh [16:43:50] I mean the RedisConnectionPool code is doing the right thing, it is RunJobs.php that is not meant to reuse anything [16:48:34] hashar: --^ [17:34:41] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#3298796 (10Tgr) I have seen this with other repos as well (`mediawiki/extensions/Collection/OfflineContentGenerator/latex_renderer` to be specific) alth... [21:19:50] 10Gerrit, 06Developer-Relations, 10GitHub-Mirrors, 06Repository-Admins, 13Patch-For-Review: Add CODE_OF_CONDUCT.md to Wikimedia projects - https://phabricator.wikimedia.org/T165540#3299044 (10Mattflaschen-WMF) >>! In T165540#3296957, @Legoktm wrote: >>>! In T165540#3293800, @Mattflaschen-WMF wrote: >> Ex...