[00:24:00] (03CR) 10Reedy: "Making it non voting might make some sense if we're going to merge your patch; but we shouldn't be enabling (voting) tests that don't pass" [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [00:25:45] (03CR) 10Paladox: "Yes but it will allow other user patches to be merged without fail until the problem is fixed." [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [00:26:45] (03CR) 10Reedy: "So we've got a stupid catch 22 in that repo, that can't be fixed inside it, without force pushing" [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [00:27:35] (03CR) 10Reedy: "Composer shouldn't have been made voting!" [integration/config] - 10https://gerrit.wikimedia.org/r/245928 (owner: 10Paladox) [00:28:14] * Reedy grumbles [00:30:17] (03CR) 10Paladox: "Yes." [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [00:40:55] 10Continuous-Integration-Config, 10MediaWiki-extensions-MsUpload: jshint:all fails on MsUpload - https://phabricator.wikimedia.org/T121367#1876838 (10Reedy) Warnings above should be fixed. Or at minimum, make the test non voting until it is [02:15:54] 5Gerrit-Migration, 10Gitblit-Deprecate, 6Release-Engineering-Team, 10Diffusion: Import all gerrit.wikimedia.org repositories with Diffusion - https://phabricator.wikimedia.org/T616#12558 (10Aklapper) [02:16:26] 5Gerrit-Migration, 10Gitblit-Deprecate, 6Release-Engineering-Team, 10Diffusion: Import all gerrit.wikimedia.org repositories with Diffusion - https://phabricator.wikimedia.org/T616#12600 (10Aklapper) [02:56:25] PROBLEM - Puppet failure on wmfbranch is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [03:31:34] RECOVERY - Puppet failure on wmfbranch is OK: OK: Less than 1.00% above the threshold [0.0] [07:12:44] (03PS1) 10Legoktm: Update composer to 1.0.0-alpha11 [integration/composer] - 10https://gerrit.wikimedia.org/r/258933 [07:16:19] +44806, -40298 [07:51:27] PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197) [09:20:21] whois hashar [09:20:25] oops :) [09:52:33] zeljkof: hashar is some sort of a wikimedia pokemon [09:53:52] petan: :D what do you mean by that? [09:54:16] you asked :P [09:54:17] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [10:00:09] known ^ [10:00:39] (about the puppet failure, not hashar being a pokemon) [10:00:40] :P [10:15:43] PROBLEM - Puppet failure on deployment-conf03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:25:40] RECOVERY - Puppet failure on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:01:23] RECOVERY - Host deployment-cache-parsoid04 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [11:13:02] (03PS2) 10Thiemo Mättig (WMDE): Utility script for trimming i18n files. [tools/code-utils] - 10https://gerrit.wikimedia.org/r/190825 (owner: 10Daniel Kinzler) [11:14:19] (03PS1) 10Thiemo Mättig (WMDE): Fix broken/incomplete PHPDoc tags in two utilities [tools/code-utils] - 10https://gerrit.wikimedia.org/r/258965 [11:15:02] (03PS1) 10Thiemo Mättig (WMDE): Remove unused variables from two utilities [tools/code-utils] - 10https://gerrit.wikimedia.org/r/258966 [11:17:13] PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197) [12:11:04] hashar is not around today? [12:59:24] !log dist-upgrade of all CI slaves [12:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:03:05] PROBLEM - Puppet failure on integration-slave-trusty-1023 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [13:05:57] PROBLEM - Puppet failure on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [13:07:01] PROBLEM - Puppet failure on integration-slave-trusty-1016 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:10:14] PROBLEM - Puppet failure on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [13:13:24] PROBLEM - Puppet failure on integration-slave-trusty-1012 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [13:14:00] PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:23:07] RECOVERY - Puppet failure on integration-slave-trusty-1023 is OK: OK: Less than 1.00% above the threshold [0.0] [13:23:29] RECOVERY - Puppet failure on integration-slave-trusty-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [13:23:59] RECOVERY - Puppet failure on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [13:25:57] RECOVERY - Puppet failure on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [13:27:05] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:30:19] RECOVERY - Puppet failure on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [13:32:06] RECOVERY - Puppet failure on integration-slave-trusty-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [13:52:01] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [13:52:54] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6operations, 7Availability, and 2 others: puppet keep trying to restart redis because upstart track wrong PID - https://phabricator.wikimedia.org/T121396#1877607 (10hashar) 3NEW a:3yuvipanda [13:55:46] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6operations, 7Availability, and 3 others: puppet keep trying to restart redis because upstart track wrong PID - https://phabricator.wikimedia.org/T121396#1877620 (10hashar) a:5yuvipanda>3hashar [14:09:12] !log beta and integration: killing redis-servers on Ubuntu instances so they are properly tracked by upstart/puppet ( https://phabricator.wikimedia.org/T121396 ) [14:09:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:11:44] PROBLEM - Disk space on scandium is CRITICAL: DISK CRITICAL - /srv/ssd is not accessible: No such file or directory [14:12:24] bah [14:17:11] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6operations, 7Availability, and 3 others: puppet keep trying to restart redis because upstart track wrong PID - https://phabricator.wikimedia.org/T121396#1877637 (10hashar) a:5hashar>3None Fixed on Beta cluster / CI. Not sure of h... [14:22:31] with your permission, I would like to rewrite https://wikitech.wikimedia.org/wiki/How_to_do_a_schema_change [14:27:40] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6operations, 7Availability, and 3 others: puppet keep trying to restart redis because upstart track wrong PID - https://phabricator.wikimedia.org/T121396#1877649 (10scfc) Actually, @Joe's 16794d5a56bae609d7a6fe85382cfe5e475063cb //remo... [14:34:04] RECOVERY - Disk space on scandium is OK: DISK OK [14:34:46] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6operations, 7Availability, and 3 others: puppet keep trying to restart redis because upstart track wrong PID - https://phabricator.wikimedia.org/T121396#1877662 (10hashar) Oh good finding @scfc , right now the instances do have `daemo... [14:50:43] PROBLEM - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [14:53:44] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1877688 (10hashar) 3NEW [15:02:01] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1877725 (10hashar) scandium has: ``` lang=ruby mount { '/srv/ssd': ensure => mounted, device => '/dev/md2', fstype => 'xfs', options => 'noatime,nodiratime... [15:02:56] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1877738 (10hashar) So I think we need to drop in `/etc/fstab` the `/srv` mount: ``` UUID=d588649c-4a40-4853-8d33-a82ed028fb1e /srv xfs defaults 0 2 ``` Then unmount both point and remount `/srv... [15:08:00] https://github.com/wikimedia/mediawiki/blob/master/README#L25-L26 [15:08:09] Why bugs.mediawiki.org? [15:10:46] jynus: Also, JFDI :) [15:12:51] !log Stopping zuul-merger daemon on scandium. It lost its disk somehow earlier "DISK CRITICAL - /srv/ssd is not accessible: No such file or directory" https://phabricator.wikimedia.org/T121400#1877725 [15:12:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:27:24] (03PS2) 10Hashar: [LiquidThreads] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/257955 (owner: 10Paladox) [15:27:37] (03CR) 10Hashar: [C: 032] "Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/257955 (owner: 10Paladox) [15:32:42] (03Merged) 10jenkins-bot: [LiquidThreads] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/257955 (owner: 10Paladox) [15:48:48] (03CR) 10Paladox: "Thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/257955 (owner: 10Paladox) [15:57:27] (03PS2) 10Hashar: Disable composer-test for Offline extension [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [15:58:08] (03CR) 10Hashar: [C: 032] "Oops. Thanks Reedy. Not bothering adding back phplint, it is not worth it." [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [15:58:42] (03Abandoned) 10Hashar: Exhaust nodepool 1 [integration/config] - 10https://gerrit.wikimedia.org/r/256579 (owner: 10Hashar) [15:58:45] (03Abandoned) 10Hashar: Exhaust nodepool 2 [integration/config] - 10https://gerrit.wikimedia.org/r/256580 (owner: 10Hashar) [15:58:49] (03Abandoned) 10Hashar: Exhaust nodepool 3 [integration/config] - 10https://gerrit.wikimedia.org/r/256581 (owner: 10Hashar) [15:58:53] (03Abandoned) 10Hashar: Exhaust nodepool 4 [integration/config] - 10https://gerrit.wikimedia.org/r/256582 (owner: 10Hashar) [15:58:56] (03Abandoned) 10Hashar: Exhaust nodepool 5 [integration/config] - 10https://gerrit.wikimedia.org/r/256583 (owner: 10Hashar) [15:58:59] (03Abandoned) 10Hashar: Exhaust nodepool 6 [integration/config] - 10https://gerrit.wikimedia.org/r/256584 (owner: 10Hashar) [15:59:01] (03Abandoned) 10Hashar: Exhaust nodepool 7 [integration/config] - 10https://gerrit.wikimedia.org/r/256585 (owner: 10Hashar) [15:59:03] (03Abandoned) 10Hashar: Exhaust nodepool 8 [integration/config] - 10https://gerrit.wikimedia.org/r/256586 (owner: 10Hashar) [15:59:05] (03Abandoned) 10Hashar: Exhaust nodepool 9 [integration/config] - 10https://gerrit.wikimedia.org/r/256587 (owner: 10Hashar) [15:59:08] (03Abandoned) 10Hashar: Exhaust nodepool 10 [integration/config] - 10https://gerrit.wikimedia.org/r/256588 (owner: 10Hashar) [15:59:15] (03Merged) 10jenkins-bot: Disable composer-test for Offline extension [integration/config] - 10https://gerrit.wikimedia.org/r/258910 (owner: 10Reedy) [16:04:40] (03CR) 10Hashar: [C: 04-1] "The composer.json is missing a "test" script. See https://www.mediawiki.org/wiki/Continuous_integration/Entry_points#PHP :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/258911 (owner: 10Reedy) [16:06:05] (03Abandoned) 10Reedy: Enable composer-test on OAuth extension [integration/config] - 10https://gerrit.wikimedia.org/r/258911 (owner: 10Reedy) [16:08:19] (03PS1) 10Paladox: [Offline] Make test extension vote false [integration/config] - 10https://gerrit.wikimedia.org/r/258993 [16:09:58] 10Continuous-Integration-Config, 10MediaWiki-extensions-Newsletter: Migrate mediawiki/extensions/Newsletter tests to use npm and grunt jsonlint/jshin - https://phabricator.wikimedia.org/T121352#1877864 (10hashar) That should be replaced by `npm test`. https://www.mediawiki.org/wiki/Continuous_integration/Entr... [16:15:03] 10Continuous-Integration-Config, 10Wikidata: increase timeout of composer for php-composer-test - https://phabricator.wikimedia.org/T121291#1877871 (10hashar) Yup apparently composer comes with its own timeout https://getcomposer.org/doc/06-config.md#process-timeout mentioned in T112280. To run phpcs for core... [16:24:54] PROBLEM - Puppet failure on deployment-puppetmaster is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:34:53] RECOVERY - Puppet failure on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [16:35:43] 10Deployment-Systems, 6Release-Engineering-Team, 5Patch-For-Review, 7user-notice: Move the train deployment from Thursday to Wednesday for some Wikipedia sites - https://phabricator.wikimedia.org/T115002#1877919 (10Amire80) Looks like this is done and nothing exploded :) Can this be closed? [17:36:37] (03PS1) 10Legoktm: Fix Matt and Stephane's email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/259025 [17:36:59] (03PS2) 10Legoktm: Fix Matt and Stephane's email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/259025 [18:17:30] (03CR) 10Legoktm: [C: 032] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/259025 (owner: 10Legoktm) [18:17:48] 10Deployment-Systems, 3Scap3, 6Analytics-Backlog, 6Services, 6operations: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#1878367 (10Milimetric) [18:18:43] (03Merged) 10jenkins-bot: Fix Matt and Stephane's email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/259025 (owner: 10Legoktm) [18:20:06] Yippee, build fixed! [18:20:07] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #721: 09FIXED in 2 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/721/ [18:21:04] wtffff [18:21:21] it's been failing for 2 weeks straight, but I trigger it manually and it passes? [18:23:07] lol [18:28:43] legoktm: It hates you. [18:29:11] legoktm: Or, more likely, the breakage is resource contention; when run on its own it works fine, but when run in parallel with a dozen other browser tests it dies. [18:32:42] ugh. [18:32:50] we don't stagger them? [18:44:19] PROBLEM - Puppet failure on deployment-cache-mobile04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:45:09] Project beta-scap-eqiad build #82402: 04FAILURE in 0.58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82402/ [18:50:37] PROBLEM - Puppet failure on deployment-cache-upload04 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [19:01:52] Yippee, build fixed! [19:01:52] Project beta-scap-eqiad build #82403: 09FIXED in 6 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82403/ [19:04:40] heh, https://phabricator.wikimedia.org/D79 [19:19:02] 10Continuous-Integration-Config, 6operations, 7Puppet: translatewiki-puppetlint-strict does not honor puppet-lint.rc file in /puppet - https://phabricator.wikimedia.org/T116552#1878674 (10Dzahn) >>! In T116552#1752138, @yuvipanda wrote: > I already see `--no-80chars-check` in .puppet-lint.rc in operations/pu... [19:40:27] marxarelli: So I was looking at scap.cfg vis à vis debian packaging. I wonder if we need to move more of that stuff outside the package. [19:40:37] Sourced from puppet to like /etc/scap.cfg [19:41:03] that would make sense (and wa part of why I made the config loading so flexible) [19:42:44] seems like a good idea to me [19:46:19] * ostriches files task [19:47:42] 3Scap3, 7Puppet: Move scap.cfg things out of scap and into puppet - https://phabricator.wikimedia.org/T121435#1878828 (10demon) 3NEW [19:48:46] marxarelli: howdy :-} [19:49:07] marxarelli: doesn't mediawiki_selenium has shared logic to easily click the edit tab ? [19:49:27] hashar: don't think so [19:49:50] also have hard time finding the doc for PageObject a() selectors [19:50:01] my use case is that I have a(:create_source, text: 'Create source') [19:50:12] and what it really should do is look for an element with id #ca-edit [19:50:18] i believe they abstract that as `link(...` [19:50:23] then click the child element [19:52:17] hashar: https://github.com/cheezy/page-object/wiki/Nested-Elements [19:52:32] or i can use Xpath :D [19:52:35] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1878865 (10Ottomata) a:3Ottomata [19:52:40] or that, yes [19:52:52] or css [19:52:57] well css [19:53:12] I am always wondering how bad my css is going to be perf wise [19:53:19] cause #someid A { } [19:53:28] ends up really looking for all element [19:53:43] then figure out one that is a descendent of an element having the id "someid" [19:53:51] (don't quote me) [19:54:14] lol. https://gerrit.wikimedia.org/r/#/c/259060/ [19:55:30] i'm fairly certain the css selector is evaluated by the webdriver server [19:59:42] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1878882 (10Ottomata) Ok, I did this. `/srv/ssd` is now mounted, but `/srv` is not. However, due to some previous job run, it looks like zuul was cloned at `/srv/ssd/zuul` when `/srv` was mount... [20:15:09] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1878983 (10hashar) Thank you! I restarted the zuul-merger instance since /srv/ssd/zuul/git is now fine. `/srv/ssd/ssd` can be nuked entirely. I lack root access to do so. [20:15:50] !log scandium restarted zuul-merger [20:15:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:16:26] RECOVERY - zuul_merger_service_running on scandium is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [20:16:38] 10Continuous-Integration-Infrastructure, 6operations: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1878985 (10Ottomata) 5Open>3Resolved [20:16:51] 10Continuous-Integration-Infrastructure, 6operations, 7WorkType-Maintenance: scandium lost /srv - https://phabricator.wikimedia.org/T121400#1878986 (10hashar) [20:20:37] oh my [20:22:18] I fixed a browser test!!!!!!!!!!!!!!!!!!!!!! [20:29:06] https://gerrit.wikimedia.org/r/259075 [21:02:41] 10Deployment-Systems, 6operations: Make l10nupdate user a system user - https://phabricator.wikimedia.org/T120585#1879129 (10Dzahn) a:3Dzahn [21:03:29] 10Deployment-Systems, 6operations: Make l10nupdate user a system user - https://phabricator.wikimedia.org/T120585#1856865 (10Dzahn) Let's start by agreeing on a specific (lower) UID for this and then update https://wikitech.wikimedia.org/wiki/UID So what number do we pick? [21:05:03] newbie logstash question: how do i find everything that the Math extension logged? [21:05:09] bd808 perhaps? ^ [21:08:08] mobrovac: do you know what logging channel it writes to? [21:09:11] bd808: $logger = LoggerFactory::getInstance( 'Math' ) [21:09:27] does that answer your q? [21:09:28] :P [21:09:53] mobrovac: You should be able to go to https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki and put "channel:Math" in the query field then [21:10:05] ah kk [21:10:05] assuming that we have that channel enabled [21:10:09] thnx bd808! [21:10:13] hm [21:10:17] * mobrovac looking [21:11:17] hm, 0 logs for the last 6h [21:11:34] i guess not then [21:11:35] :( [21:12:38] mobrovac: I don't see "Math" in $wmgMonologChannels -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L4353 [21:13:01] which would mean that the logs all go to /dev/null [21:13:15] it's webscale [21:13:16] * mobrovac is tempted to go search /dev/null [21:13:16] :D [21:21:06] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #99: 04FAILURE in 5 min 5 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/99/ [21:22:12] (03PS1) 10Gilles: Add thumbor/svg-engine [integration/config] - 10https://gerrit.wikimedia.org/r/259146 [22:39:42] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Dozens of jobs failing on integration-slave-trusty-1012 because chmod fails for /tmp/jenkins-2 - https://phabricator.wikimedia.org/T120824#1879407 (10thcipriani) FWIW, just happened again on integration-slave-trusty-1016, removing the `tmpfs/*.lesscac... [23:40:10] thcipriani: twentyafterfour: i have 2 or 3 subsequent patches for the math ext i'd like to schedule for swat so that they are deployed in one go, is that acceptable? [23:42:50] mobrovac: sure, a few patches to be deployed at the same time is fine, the only problem is that the current deploy method is not super atomic. That is, I can sync one file before the other, but syncing all the files at once (like a sync-dir) for the math extension might not result in all of the files getting updated at the _exact same moment_ [23:43:25] (uses rsync --delay-updates) [23:44:09] I'd make sure you make a note of it on the deployments page if that's the plan.