[00:02:24] 10Beta-Cluster-Infrastructure, 10Browser-Tests-Infrastructure, 7WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#1738691 (10Krenair) It looks like they use the UI rather than the API, so probably can't set the bot flag on edits? [00:15:36] 10Beta-Cluster-Infrastructure, 10Browser-Tests-Infrastructure, 7WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#2193226 (10Krenair) Ah, it looks like you can add an input with name="bot" and set it to value 1. I know in JS you can do it... [00:28:14] 10Beta-Cluster-Infrastructure: Set 'cluster' salt grain appropriately for all instances in beta cluster - https://phabricator.wikimedia.org/T87199#2193229 (10Krenair) a:3yuvipanda Needs information from @yuvipanda [00:33:39] 10Beta-Cluster-Infrastructure, 6Operations, 13Patch-For-Review: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2193231 (10Krenair) [00:37:29] 10Beta-Cluster-Infrastructure, 10Monitoring, 7Varnish: Monitor Varnish caches on beta cluster have two varnishd process running - https://phabricator.wikimedia.org/T75944#786646 (10Krenair) What sort of monitoring do you have in mind? [00:41:33] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 15User-greg: Make Beta Cluster update jobs (eg: beta-scap-eqiad, beta-mediawiki-config-update-eqiad etc) complain in IRC on every failure - https://phabricator.wikimedia.org/T129374#2193236 (10Krenair) 5Open>3Resolved a:3greg [00:44:32] 10Beta-Cluster-Infrastructure, 6Security-Team: Install Ex:OATH to beta - https://phabricator.wikimedia.org/T131420#2166800 (10Krenair) -> https://gerrit.wikimedia.org/r/#/c/282198/ [00:47:18] 10Beta-Cluster-Infrastructure, 6Operations, 6Services, 7Tracking: Move Node.JS services to Jessie and Node 4 (tracking) - https://phabricator.wikimedia.org/T124989#2193246 (10Krenair) [00:48:48] 10Beta-Cluster-Infrastructure: On beta metawiki, a mix of the beta enwiki and the production metawiki logos show - https://phabricator.wikimedia.org/T125942#2193248 (10Krenair) (it shows beta enwiki's logo on restricted-css pages like Special:UserLogin and production metawiki's logo elsewhere) [00:49:15] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: On beta metawiki, a mix of the beta enwiki and the production metawiki logos show - https://phabricator.wikimedia.org/T125942#2193249 (10Krenair) This should be possible to fix in mediawiki-config [00:52:21] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-CentralAuth, 6Operations, 10Wikimedia-Apache-configuration: Special:CentralAutoLogin/checkLoggedIn redirects to wikimediafoundation.org on Beta Cluster - https://phabricator.wikimedia.org/T126697#2193267 (10Krenair) 5Open>3Invalid Likely a cached is... [00:55:06] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization: [betacluster] "Cross-Origin Request Blocked" and "the content must be served over HTTPS" console errors - https://phabricator.wikimedia.org/T128207#2067356 (10Krenair) Probably fail due to the untrusted certificate. See {T97593} and {... [00:56:32] 10Beta-Cluster-Infrastructure, 10VisualEditor, 10VisualEditor-MediaWiki: Beta Cluster threw a 'internal_api_error_DBConnectionError' at me - https://phabricator.wikimedia.org/T129192#2193274 (10Krenair) a:3Jdforrester-WMF [00:58:01] 10Beta-Cluster-Infrastructure, 10Flow, 3Collab-Archive-2015-2016: Set up second External Store cluster on Beta - https://phabricator.wikimedia.org/T128417#2073927 (10Krenair) We have a first external store? [01:39:25] 10Beta-Cluster-Infrastructure: Investigate whether deployment-memc04 instance is still needed - https://phabricator.wikimedia.org/T128178#2066261 (10Krenair) Could shut it down and see if anything breaks. [01:42:20] 10Beta-Cluster-Infrastructure: Investigate whether deployment-memc04 instance is still needed - https://phabricator.wikimedia.org/T128178#2193285 (10Krenair) Who is likely to have created it? Maybe we should assign this task to them. [01:57:31] 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure, 6Operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2193289 (10Krenair) [01:57:33] 10Beta-Cluster-Infrastructure, 10MediaWiki-Authentication-and-authorization: [betacluster] "Cross-Origin Request Blocked" and "the content must be served over HTTPS" console errors - https://phabricator.wikimedia.org/T128207#2193288 (10Krenair) [02:06:54] 10Beta-Cluster-Infrastructure: Creating wiki at beta cluster for the Dutch Wikipedia - https://phabricator.wikimedia.org/T118005#1789123 (10Krenair) @Natuur12: Are you still interested in doing this? [02:50:04] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #938: 04FAILURE in 3.6 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/938/ [03:00:26] 10Beta-Cluster-Infrastructure, 10Sentry, 10Wikimedia-Logstash, 13Patch-For-Review: Channel PHP errors from Logstash to Sentry on the beta cluster - https://phabricator.wikimedia.org/T85239#2193324 (10Krenair) a:3Tgr [03:09:02] 10Beta-Cluster-Infrastructure, 10Monitoring: Shinken warnings about free space on beta cluster Varnish instances - https://phabricator.wikimedia.org/T76417#799841 (10Krenair) Shinken is not currently warning and /srv/vdb on all three deployment-cache-* hosts is only 59% used. Is there still something we need t... [03:16:36] 10Beta-Cluster-Infrastructure: Upgrade varnish automatically via puppet in Beta Cluster - https://phabricator.wikimedia.org/T75564#760343 (10Krenair) Is this done [03:25:41] 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure, 6Operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2193331 (10Krenair) [03:25:44] 10Beta-Cluster-Infrastructure: Reenable $wgMWOAuthSecureTokenTransfer=true; on the beta cluster - https://phabricator.wikimedia.org/T67421#2193330 (10Krenair) [03:30:13] 10Beta-Cluster-Infrastructure, 10Flow, 3Collab-Archive-2015-2016: Set up second External Store cluster on Beta - https://phabricator.wikimedia.org/T128417#2193332 (10Mattflaschen) We do now (though it's not being used yet, pending https://gerrit.wikimedia.org/r/#/c/282440/ ). However, there is a missing blo... [03:30:30] RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [03:30:48] 10Beta-Cluster-Infrastructure, 10Staging, 10DBA, 3Collab-Archive-2015-2016, and 2 others: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#2193334 (10Mattflaschen) [03:30:50] 10Beta-Cluster-Infrastructure, 10Flow, 3Collab-Archive-2015-2016: Set up second External Store cluster on Beta - https://phabricator.wikimedia.org/T128417#2193333 (10Mattflaschen) [03:32:33] 10Beta-Cluster-Infrastructure, 6Operations: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338#2193335 (10Krenair) [03:35:58] 10Beta-Cluster-Infrastructure: Investigate whether deployment-memc04 instance is still needed - https://phabricator.wikimedia.org/T128178#2066261 (10scfc) IIRC the Horizon UI has an "action log" which says who created an instance. [03:43:07] 10Beta-Cluster-Infrastructure: Investigate whether deployment-memc04 instance is still needed - https://phabricator.wikimedia.org/T128178#2193373 (10Krenair) a:3ori God idea @scfc @ori created this April 7, 2014, 6:02 a.m. [03:48:58] RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [04:03:59] 10Beta-Cluster-Infrastructure, 10MediaWiki-API: mw: interwiki prefix missing on beta cluster, so API's "complete documentation" is a 404. - https://phabricator.wikimedia.org/T104504#2193379 (10Krenair) 5Open>3Resolved This appears to have been fixed at some point, maybe while I was mucking around with inte... [04:05:51] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: Check status of beta interwiki json/cdb files - https://phabricator.wikimedia.org/T120427#2193382 (10Krenair) should now be a php file [04:15:34] 10Beta-Cluster-Infrastructure, 7JavaScript: Sync pages (gadgets, in CSS and/or JS) from production wikis - https://phabricator.wikimedia.org/T51779#2193385 (10Krenair) That script is not actually being run at the moment - no instance has class beta::syncsiteresources or the class that includes it, role::beta::... [04:21:08] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #777: 04FAILURE in 29 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/777/ [04:46:03] PROBLEM - Keyholder status on mira is CRITICAL: [Errno 2] No such file or directory [04:47:24] PROBLEM - Keyholder status on deployment-tin is CRITICAL: [Errno 2] No such file or directory [04:48:27] ^ that's me fiddling about, ignore [05:03:00] 10Beta-Cluster-Infrastructure, 10Monitoring, 7Shinken: Monitor keyholder on deployment-bastion - https://phabricator.wikimedia.org/T111064#2193392 (10Krenair) But can we even have shinken trigger that monitoring plugin remotely? [05:41:28] 10Beta-Cluster-Infrastructure: Setup puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#2193401 (10Krenair) Why does the approach production uses not work for us? I'm vaguely aware of exported resources only being available in production - where can I find more... [05:41:47] 10Beta-Cluster-Infrastructure, 7Puppet: Setup puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#2193402 (10Krenair) [05:49:14] Project beta-scap-eqiad build #97562: 04FAILURE in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/97562/ [05:53:38] Project beta-scap-eqiad build #97563: 04STILL FAILING in 2 min 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/97563/ [05:57:57] Project beta-scap-eqiad build #97564: 04STILL FAILING in 2 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/97564/ [06:02:51] /var/log/nutcracker/ is a too large on -mediawiki01 [06:09:56] Yippee, build fixed! [06:09:57] Project beta-scap-eqiad build #97565: 09FIXED in 5 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/97565/ [06:18:02] freed even more space by running autoremove [06:26:41] RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:33:08] Yippee, build fixed! [08:33:08] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #936: 09FIXED in 23 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/936/ [09:20:06] Project beta-update-databases-eqiad build #7749: 04FAILURE in 5.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/7749/ [10:20:33] Yippee, build fixed! [10:20:34] Project beta-update-databases-eqiad build #7750: 09FIXED in 32 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/7750/ [11:34:54] PROBLEM - Host integration-trusty-1026 is DOWN: CRITICAL - Host Unreachable (10.68.17.98) [12:19:34] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:43:00] PROBLEM - Puppet run on deployment-mediawiki01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:23:00] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:35:23] 10Beta-Cluster-Infrastructure, 10Monitoring, 7Shinken: Monitor keyholder on deployment-bastion - https://phabricator.wikimedia.org/T111064#1593275 (10scfc) Not in the same way as Icinga. AFAICT, with Shinken and Labs the pattern (for example for Puppet) is "instance reports regularly success to Graphite", "... [17:58:30] Yippee, build fixed! [17:58:31] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #875: 09FIXED in 29 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/875/ [18:27:20] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:20:45] 10Beta-Cluster-Infrastructure, 7Puppet, 7Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2193858 (10mmodell) [19:20:47] 10Beta-Cluster-Infrastructure, 7Puppet: deployment-puppetmaster puppet failures due to apache trying to start on same port as nginx - https://phabricator.wikimedia.org/T132269#2193856 (10mmodell) 5Open>3Resolved @krenair: Thanks for getting to the bottom of this. I just did the following: * removed the a... [19:27:16] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [19:34:30] PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [19:34:38] PROBLEM - Puppet run on deployment-sentry2 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:35:50] PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:35:54] PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [19:35:58] PROBLEM - Puppet run on deployment-poolcounter01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:36:00] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [19:36:08] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:36:36] PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:37:04] PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:37:48] PROBLEM - Puppet run on deployment-mediawiki02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:37:49] PROBLEM - Puppet run on deployment-mediawiki03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:38:03] PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:38:17] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [19:39:51] PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [19:41:33] PROBLEM - Puppet run on deployment-memc03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:43:27] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:39] RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:51:10] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:52:06] RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:16] PROBLEM - Puppet run on deployment-memc02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:55:50] 10Beta-Cluster-Infrastructure: deployment-prep puppet failures due to "Could not find class" or "Puppet::Parser::AST::Resource failed with error ArgumentError: Invalid resource type" - https://phabricator.wikimedia.org/T131946#2193894 (10Krenair) [19:56:01] 10Beta-Cluster-Infrastructure, 7Puppet: deployment-prep puppet failures due to "Could not find class" or "Puppet::Parser::AST::Resource failed with error ArgumentError: Invalid resource type" - https://phabricator.wikimedia.org/T131946#2183851 (10Krenair) [19:56:38] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:55] RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:04:55] RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:55] RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:59] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:21] RECOVERY - Puppet run on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:11:34] RECOVERY - Puppet run on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:13:00] RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:36] RECOVERY - Puppet run on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:26] Project selenium-Echo » chrome,beta,Linux,,contintLabsSlave && UbuntuTrusty build #21: 04FAILURE in 1 min 25 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,VERSION=,label=contintLabsSlave%20&&%20UbuntuTrusty/21/ [20:43:28] Project selenium-Echo » firefox,beta,Linux,,contintLabsSlave && UbuntuTrusty build #21: 04FAILURE in 2 min 28 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,VERSION=,label=contintLabsSlave%20&&%20UbuntuTrusty/21/ [20:48:12] (03CR) 10JanZerebecki: [C: 031] Run mw-apply-settings for parsoidsvc-php-parsertests [integration/config] - 10https://gerrit.wikimedia.org/r/274863 (owner: 10Arlolra) [21:20:49] 5Gerrit-Migration: Identify features Gerrit users would miss in Phabricator - https://phabricator.wikimedia.org/T23#2193962 (10JanZerebecki) [21:20:51] 5Gerrit-Migration, 10Differential: git commit hash remains the same from submitting to after merge - https://phabricator.wikimedia.org/T91420#2193961 (10JanZerebecki) 5declined>3Open [21:20:58] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #233: 04FAILURE in 4 min 57 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/233/ [21:35:43] (03PS1) 10Paladox: [Loops] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282622 [21:42:52] (03PS1) 10Paladox: [MagicNoCache] Add npm entry point and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282624 [21:50:07] 10Beta-Cluster-Infrastructure: Investigate whether deployment-memc04 instance is still needed - https://phabricator.wikimedia.org/T128178#2193976 (10ori) 5Open>3Resolved It's not needed. [22:06:39] (03PS1) 10Paladox: [Maps] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282627 [22:10:39] (03PS1) 10Paladox: [MassEditRegex] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282629 [22:14:51] (03PS1) 10Paladox: [MediaFunctions] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282631 [22:16:23] PROBLEM - Host deployment-memc04 is DOWN: CRITICAL - Host Unreachable (10.68.17.69) [22:21:11] (03PS1) 10Paladox: [MediaWikiAuth] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282633 [22:24:29] (03PS1) 10Paladox: [MediaWikiChat] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282635 [22:24:40] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [22:34:55] (03PS1) 10Paladox: [Minifier] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282637 [22:35:43] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1015: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1015/ [22:38:54] (03PS1) 10Paladox: [Model] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/282639 [23:23:00] (03PS1) 10Prtksxna: [ImageTweaks] Add npm test and jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/282641 [23:23:43] (03CR) 10jenkins-bot: [V: 04-1] [ImageTweaks] Add npm test and jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/282641 (owner: 10Prtksxna)