[00:05:56] 3Beta-Cluster: BounceHandler extension surprisingly missing from beta wiki - https://phabricator.wikimedia.org/T87624#995498 (1001tonythomas) Originally added into beta by https://gerrit.wikimedia.org/r/#/c/154342/ [00:51:20] 3Ops-Access-Requests, Continuous-Integration: Make sure relevant RelEng people have access to gallium (Chris M, Dan, Mukunda, Zeljko) - https://phabricator.wikimedia.org/T85936#995585 (10Jgreen) [03:29:14] (03PS2) 10Sn1per: WIP: Enable jshint voting for TwoFactorAuthentication [integration/config] - 10https://gerrit.wikimedia.org/r/186141 (https://phabricator.wikimedia.org/T63641) [05:16:08] 3Beta-Cluster, MediaWiki-Special-pages: MediaWiki version url broken on beta - https://phabricator.wikimedia.org/T87636#995694 (10Reedy) 3NEW [05:23:01] 3Phabricator: request for deletion: 'shell' project - https://phabricator.wikimedia.org/T87623#995709 (10Krenair) operations, deployers, etc.? [05:51:22] 3Phabricator: request for deletion: 'shell' project - https://phabricator.wikimedia.org/T87623#995722 (10Jdforrester-WMF) +1. [06:09:52] (03PS1) 10Krinkle: [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 [06:12:10] (03CR) 10Jforrester: "Gosh." [integration/config] - 10https://gerrit.wikimedia.org/r/186934 (owner: 10Krinkle) [06:13:11] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 (owner: 10Krinkle) [06:14:27] (03PS2) 10Krinkle: [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 [06:15:32] (03CR) 10Krinkle: [C: 04-1] "The jshint job isn't passing yet." [integration/config] - 10https://gerrit.wikimedia.org/r/186141 (https://phabricator.wikimedia.org/T63641) (owner: 10Sn1per) [06:25:11] 3Beta-Cluster: BounceHandler extension surprisingly missing from beta wiki - https://phabricator.wikimedia.org/T87624#995746 (1001tonythomas) 5Open>3Resolved a:301tonythomas http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:EmailUser produces VERPed emails :) Thanks @Reedy [06:29:43] 3Triagers, Phabricator, operations, Project-Creators: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#995762 (10Qgil) Done! @Jdlrobson, please remember to follow https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects when creating new... [06:38:49] (03PS3) 10Krinkle: [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 [06:41:41] (03PS4) 10Krinkle: [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 [07:00:40] 3Release-Engineering, MediaWiki-Developer-Summit-2015, Continuous-Integration: 2015 MediaWiki Developer Summit - State of continuous integration (CI), what we did in 2014 - https://phabricator.wikimedia.org/T86750#995798 (10Qgil) Please update the description with the achievements of this session. Thank you in a... [07:01:13] 3Release-Engineering, MediaWiki-Developer-Summit-2015, Continuous-Integration: 2015 MediaWiki Developer Summit - State of continuous integration (CI), what we want to do in 2015 - https://phabricator.wikimedia.org/T86752#995805 (10Qgil) Please update the description with the achievements of this session. Thank y... [07:01:25] 3Scrum-of-Scrums, operations, Deployment-Systems: Update wikitech wiki with deployment train - https://phabricator.wikimedia.org/T70751#995808 (10Dzahn) [07:02:52] (03PS5) 10Krinkle: [WIP] Add qunit-karma macro [integration/config] - 10https://gerrit.wikimedia.org/r/186934 [07:05:47] 3Release-Engineering, MediaWiki-Developer-Summit-2015, Quality-Assurance: Advanced Topics in Browser Test Automation - https://phabricator.wikimedia.org/T86070#995820 (10Qgil) Please update the description with the achievements of this session. Thank you in advance. [09:30:00] (03PS3) 10Adrian Lang: Add npm job to wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/184592 [09:30:44] (03CR) 10Adrian Lang: "@Krinkle: Thanks. Like this?" [integration/config] - 10https://gerrit.wikimedia.org/r/184592 (owner: 10Adrian Lang) [12:31:39] PROBLEM - Puppet failure on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:57:40] 3Beta-Cluster, MediaWiki-extensions-MathSearch: Broken submodule - https://phabricator.wikimedia.org/T87643#996027 (10Physikerwelt) [16:18:18] 3Phabricator: [Upstream] Feature request: search for duplicates after a bug title is typed in - https://phabricator.wikimedia.org/T87650#996102 (10liangent) 3NEW [16:25:41] 3Phabricator: [Upstream] Feature request: search for duplicates after a bug title is typed in - https://phabricator.wikimedia.org/T87650#996112 (10Glaisher) [16:46:22] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:00:33] 3Phabricator, operations: Add Addshore to phabricator WMF-NDA group - https://phabricator.wikimedia.org/T87651#996136 (10Addshore) 3NEW [17:12:01] <^d> greg-g: next time let's not do swat on days like yesterday and today. Nobody's available @ 8 and 4 and we have to scramble. [17:13:36] ^d: fair, should have just said "minor swat-like deploys ok" but not set a time [17:14:15] <^d> The more you know :) [17:14:56] :) [17:39:55] Hello aharoni ! I am planning to run this job now to test why is it failing: https://integration.wikimedia.org/ci/view/BrowserTests/view/VisualEditor/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/build [17:54:45] (03CR) 10Krinkle: [C: 04-1] "Assuming your npm pipeline does at the very least jshint, mwext-Wikibase-jslint should no longer be run from your 'test' and 'gate' pipeli" [integration/config] - 10https://gerrit.wikimedia.org/r/184592 (owner: 10Adrian Lang) [17:54:56] (03CR) 10Krinkle: "@Adrian: Yep :)" [integration/config] - 10https://gerrit.wikimedia.org/r/184592 (owner: 10Adrian Lang) [18:56:53] PROBLEM - App Server bits response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:55] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:55] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:56] PROBLEM - App Server bits response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:57] PROBLEM - App Server bits response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:14] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:10:12] greg-g: ^ re all these alerts, there was a serious CVN just released, so we’re doing clusterwide security updates [19:10:22] (03PS2) 10Krinkle: [WIP] log clean up [integration/config] - 10https://gerrit.wikimedia.org/r/186980 [19:10:57] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:10:58] YuviPanda: the alerts in here are childs play -operations is the real mess :p [19:11:11] JohnLewis: heh :) [19:11:56] (03PS2) 10Krinkle: De-duplicate LOG_DIR logic [integration/jenkins] - 10https://gerrit.wikimedia.org/r/186976 [19:13:17] (03PS3) 10Krinkle: Clean up archive-log-dir [integration/config] - 10https://gerrit.wikimedia.org/r/186980 [19:18:31] YuviPanda: yep, ty [19:18:54] greg-g: so expect instabilities for today :) [19:19:40] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:19:41] weee [19:26:44] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:37:05] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:39:47] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:40:24] PROBLEM - Puppet failure on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:40:24] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:41:00] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:44] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:16] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:06] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:39:51] what software is wikimedia using for the image scalers ? [20:41:46] PROBLEM - Host deployment-apertium01 is DOWN: CRITICAL - Host Unreachable (10.68.16.79) [20:43:05] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org) [20:43:25] PROBLEM - Host deployment-sca-cache01 is DOWN: CRITICAL - Host Unreachable (10.68.16.6) [20:45:58] PROBLEM - Host deployment-memc02 is DOWN: CRITICAL - Host Unreachable (10.68.16.14) [20:46:24] PROBLEM - Host deployment-cache-text02 is DOWN: CRITICAL - Host Unreachable (10.68.16.16) [20:46:28] PROBLEM - Host deployment-lucid-salt is DOWN: CRITICAL - Host Unreachable (10.68.17.49) [20:46:41] PROBLEM - Host deployment-fluoride is DOWN: CRITICAL - Host Unreachable (10.68.16.190) [20:47:00] PROBLEM - Host deployment-salt is DOWN: CRITICAL - Host Unreachable (10.68.16.99) [20:50:42] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:51:44] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:52:28] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [20:52:42] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:52:44] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:52:55] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:53:13] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:53:21] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:53:44] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:55:07] RECOVERY - Host deployment-sca-cache01 is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms [20:56:26] RECOVERY - Host deployment-lucid-salt is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [20:56:42] RECOVERY - Host deployment-memc02 is UP: PING OK - Packet loss = 0%, RTA = 1.06 ms [20:56:46] RECOVERY - Host deployment-apertium01 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [20:56:48] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:56:56] RECOVERY - Host deployment-salt is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [20:56:58] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:57:24] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:57:28] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:57:41] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:58:03] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:58:05] RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [20:58:24] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:58:36] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [20:58:58] RECOVERY - Host deployment-cache-text02 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [21:00:09] RECOVERY - Host deployment-fluoride is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms [21:00:21] PROBLEM - Host deployment-cache-upload02 is DOWN: CRITICAL - Host Unreachable (10.68.17.51) [23:41:40] ok greg says things are broken, where do I start? :) [23:41:48] Reedy and ^d: feel free to bogart some of twentyafterfour ... yeah what he said [23:42:01] greg-g: We've got opsen on it [23:42:08] ohh ahhh [23:42:10] Reedy: I -2’d that patch :) [23:42:11] ;) [23:42:24] YuviPanda: alex live hacked it :P [23:45:38] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:48:47] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:53:53] uhm [23:54:25] (/Stage[main]/Base/File[/etc/default/acct]) Could not evaluate: Connection refused - connect(2) Could not retrieve file metadata for puppet:///modules/base/labs-acct.default: Connection refused - connect(2) [23:54:36] is that puppet unable to connect to puppet master? [23:55:11] that's from deployment-mathoid [23:56:13] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]