[00:00:55] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 50.00% of data above the critical threshold [0.0] [00:02:07] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 66.67% of data above the critical threshold [0.0] [00:02:35] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [00:08:18] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 60.00% of data above the critical threshold [0.0] [00:13:06] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [00:20:40] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [00:21:55] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 40.00% of data above the critical threshold [0.0] [00:22:25] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 50.00% of data above the critical threshold [0.0] [00:22:48] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1436497 (10thcipriani) [00:22:51] 10Deployment-Systems, 10RESTBase: Setup staging for testing RESTBase deploys - https://phabricator.wikimedia.org/T104276#1436494 (10thcipriani) 5Open>3Resolved a:3thcipriani Added the file labs-staging to the root of the ansible deploy repo with the contents: [eqiad:children] eqiad-restbase... [00:24:46] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [00:25:54] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [00:25:54] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [00:26:20] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [00:26:28] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [00:27:08] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [00:38:12] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [00:45:39] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [00:46:55] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [00:47:25] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [00:53:34] 10Deployment-Systems, 7Graphite, 5Patch-For-Review: Record sync wikiversions event in graphite - https://phabricator.wikimedia.org/T104635#1436513 (10mmodell) 5Open>3Resolved [00:55:45] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL 20.00% of data above the critical threshold [0.0] [00:56:14] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1436515 (10Krenair) We also need to make other changes to scap - to be able to deploy at all from mira, we need it to be able to tell each host which master to pull... [00:56:51] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 30.00% of data above the critical threshold [0.0] [00:57:17] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 22.22% of data above the critical threshold [0.0] [00:58:08] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 22.22% of data above the critical threshold [0.0] [00:59:08] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 44.44% of data above the critical threshold [0.0] [01:01:18] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 33.33% of data above the critical threshold [0.0] [01:01:40] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:01:52] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:01:52] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:01:56] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:02:54] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:03:26] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:04:43] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 60.00% of data above the critical threshold [0.0] [01:08:33] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 20.00% of data above the critical threshold [0.0] [01:09:11] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 22.22% of data above the critical threshold [0.0] [01:13:36] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:13:49] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1436534 (10Dzahn) ``` root@neon:~# iptables -L | grep 9200 ACCEPT tcp -- eventlog1001.eqiad.wmnet anywhere tcp dpt:9200 ACCEPT tcp -- tin.e... [01:14:06] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [01:14:22] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 77.78% of data above the critical threshold [0.0] [01:16:37] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 50.00% of data above the critical threshold [0.0] [01:21:38] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:23:54] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 30.00% of data above the critical threshold [0.0] [01:25:47] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [01:26:41] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [01:26:53] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [01:26:53] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [01:26:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [01:26:59] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [01:27:22] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [01:27:58] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [01:28:24] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [01:29:40] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [01:31:16] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [01:38:31] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [01:39:11] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [01:39:21] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [01:41:35] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [01:43:35] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [01:48:08] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [01:48:54] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [01:49:08] PROBLEM - Puppet failure on deployment-test is CRITICAL 44.44% of data above the critical threshold [0.0] [01:49:26] PROBLEM - Puppet failure on deployment-stream is CRITICAL 50.00% of data above the critical threshold [0.0] [01:52:34] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:53:32] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 30.00% of data above the critical threshold [0.0] [01:54:37] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 30.00% of data above the critical threshold [0.0] [01:59:06] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:02:42] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:04:55] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 50.00% of data above the critical threshold [0.0] [02:05:39] PROBLEM - Puppet failure on deployment-upload is CRITICAL 60.00% of data above the critical threshold [0.0] [02:08:05] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 66.67% of data above the critical threshold [0.0] [02:12:33] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:14:09] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [02:14:29] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [02:16:48] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:17:33] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 50.00% of data above the critical threshold [0.0] [02:17:53] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:18:21] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:19:21] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 20.00% of data above the critical threshold [0.0] [02:19:37] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [02:20:09] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:21:16] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1436571 (10Dzahn) after https://gerrit.wikimedia.org/r/#/c/223484/ the bot works now when using mira 19:21 < logmsgbot> !log krenair Synchronized wmf-config/testmi... [02:22:21] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:22:37] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [02:22:41] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:22:53] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:22:55] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:22:55] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:23:37] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [02:23:57] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:25:33] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 20.00% of data above the critical threshold [0.0] [02:26:33] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 30.00% of data above the critical threshold [0.0] [02:27:06] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 33.33% of data above the critical threshold [0.0] [02:29:14] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:29:54] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [02:30:40] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [02:31:37] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [02:32:43] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [02:33:03] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [02:34:36] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:35:36] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:37:29] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [02:39:35] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 90.00% of data above the critical threshold [0.0] [02:42:27] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [02:43:22] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [02:45:06] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [02:46:49] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [02:47:19] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [02:47:39] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [02:47:53] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [02:47:53] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [02:47:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [02:47:57] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [02:48:56] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [02:49:07] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [02:49:23] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [02:52:04] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [02:54:11] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [02:55:27] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [02:56:29] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [02:57:48] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL 40.00% of data above the critical threshold [0.0] [02:58:28] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 50.00% of data above the critical threshold [0.0] [02:59:36] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [03:00:40] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 20.00% of data above the critical threshold [0.0] [03:00:40] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [03:02:34] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 30.00% of data above the critical threshold [0.0] [03:03:20] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 44.44% of data above the critical threshold [0.0] [03:05:53] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 50.00% of data above the critical threshold [0.0] [03:10:11] PROBLEM - Puppet failure on deployment-test is CRITICAL 55.56% of data above the critical threshold [0.0] [03:23:25] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [03:24:33] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [03:25:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 20.00% of data above the critical threshold [0.0] [03:26:27] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 40.00% of data above the critical threshold [0.0] [03:26:55] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1436614 (10bd808) >>! In T95436#1436515, @Krenair wrote: > We also need to make other changes to scap - to be able to deploy at all from mira, we need it to be able... [03:27:46] RECOVERY - Puppet failure on deployment-eventlogging02 is OK Less than 1.00% above the threshold [0.0] [03:28:20] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [03:30:39] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [03:30:55] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [03:32:39] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [03:35:07] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [03:39:55] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 20.00% of data above the critical threshold [0.0] [03:40:23] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 30.00% of data above the critical threshold [0.0] [03:44:16] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 66.67% of data above the critical threshold [0.0] [03:55:13] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [03:56:30] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [04:09:19] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [04:09:55] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [04:10:25] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [04:21:39] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 30.00% of data above the critical threshold [0.0] [04:38:54] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 30.00% of data above the critical threshold [0.0] [04:46:07] PROBLEM - Puppet failure on deployment-test is CRITICAL 22.22% of data above the critical threshold [0.0] [04:48:05] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 44.44% of data above the critical threshold [0.0] [04:49:29] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 30.00% of data above the critical threshold [0.0] [04:50:13] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 33.33% of data above the critical threshold [0.0] [04:50:25] PROBLEM - Puppet failure on deployment-stream is CRITICAL 50.00% of data above the critical threshold [0.0] [04:50:33] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 40.00% of data above the critical threshold [0.0] [04:51:42] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [04:52:13] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 55.56% of data above the critical threshold [0.0] [05:00:06] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 44.44% of data above the critical threshold [0.0] [05:01:08] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 55.56% of data above the critical threshold [0.0] [05:03:42] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 30.00% of data above the critical threshold [0.0] [05:06:53] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 60.00% of data above the critical threshold [0.0] [05:11:09] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [05:13:29] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 50.00% of data above the critical threshold [0.0] [05:14:15] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 66.67% of data above the critical threshold [0.0] [05:15:11] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [05:15:25] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [05:15:31] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [05:15:33] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 40.00% of data above the critical threshold [0.0] [05:15:39] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 50.00% of data above the critical threshold [0.0] [05:16:07] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [05:16:37] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 50.00% of data above the critical threshold [0.0] [05:17:12] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [05:18:06] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [05:19:28] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [05:21:38] PROBLEM - Puppet failure on deployment-upload is CRITICAL 20.00% of data above the critical threshold [0.0] [05:22:44] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 40.00% of data above the critical threshold [0.0] [05:23:34] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 40.00% of data above the critical threshold [0.0] [05:24:04] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 22.22% of data above the critical threshold [0.0] [05:25:06] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [05:26:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 30.00% of data above the critical threshold [0.0] [05:27:35] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 40.00% of data above the critical threshold [0.0] [05:27:35] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 50.00% of data above the critical threshold [0.0] [05:28:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [05:29:07] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 44.44% of data above the critical threshold [0.0] [05:36:03] Yippee, build fixed! [05:36:03] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #473: FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/473/ [05:38:32] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [05:39:14] RECOVERY - Puppet failure on deployment-restbase02 is OK Less than 1.00% above the threshold [0.0] [05:40:34] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [05:40:38] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [05:41:36] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [05:50:15] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 22.22% of data above the critical threshold [0.0] [05:51:41] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [05:54:03] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [05:54:07] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [05:54:30] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 60.00% of data above the critical threshold [0.0] [06:07:45] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [06:08:37] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [06:08:43] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [06:11:53] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [06:12:33] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [06:12:33] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [06:16:15] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [06:18:41] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 20.00% of data above the critical threshold [0.0] [06:19:29] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [06:20:12] RECOVERY - Puppet failure on deployment-restbase02 is OK Less than 1.00% above the threshold [0.0] [06:20:56] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 30.00% of data above the critical threshold [0.0] [06:36:33] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 60.00% of data above the critical threshold [0.0] [06:36:37] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 60.00% of data above the critical threshold [0.0] [06:42:55] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 30.00% of data above the critical threshold [0.0] [06:46:26] PROBLEM - Puppet failure on deployment-stream is CRITICAL 20.00% of data above the critical threshold [0.0] [06:48:14] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 22.22% of data above the critical threshold [0.0] [06:48:40] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [06:50:20] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 33.33% of data above the critical threshold [0.0] [06:50:54] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [06:51:32] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 60.00% of data above the critical threshold [0.0] [06:52:37] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 20.00% of data above the critical threshold [0.0] [06:59:03] PROBLEM - Puppet failure on integration-zuul-server is CRITICAL 100.00% of data above the critical threshold [0.0] [07:01:34] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [07:01:36] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [07:02:30] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:12:55] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [07:13:56] 10Continuous-Integration-Infrastructure: Zuul: Highlight relevant change on Zuul status page when following submit pipeline url - https://phabricator.wikimedia.org/T65399#1436878 (10Krinkle) a:5Krinkle>3None [07:16:25] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [07:20:19] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [07:22:40] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [07:22:40] PROBLEM - Puppet failure on deployment-upload is CRITICAL 20.00% of data above the critical threshold [0.0] [07:23:38] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 40.00% of data above the critical threshold [0.0] [07:23:56] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 30.00% of data above the critical threshold [0.0] [07:24:38] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:24:42] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:36:33] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [07:37:07] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 33.33% of data above the critical threshold [0.0] [07:38:13] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [07:38:53] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 40.00% of data above the critical threshold [0.0] [07:38:55] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 20.00% of data above the critical threshold [0.0] [07:39:19] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 44.44% of data above the critical threshold [0.0] [07:39:54] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 40.00% of data above the critical threshold [0.0] [07:41:06] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 44.44% of data above the critical threshold [0.0] [07:41:59] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 40.00% of data above the critical threshold [0.0] [07:42:14] Yippee, build fixed! [07:42:14] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce build #92: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce/92/ [07:45:05] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 33.33% of data above the critical threshold [0.0] [07:47:19] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 40.00% of data above the critical threshold [0.0] [07:48:27] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 50.00% of data above the critical threshold [0.0] [07:48:41] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [07:48:57] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [07:49:35] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [07:50:05] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 66.67% of data above the critical threshold [0.0] [07:50:30] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 20.00% of data above the critical threshold [0.0] [07:52:46] 10Deployment-Systems, 10Traffic, 6operations: Varnish cache busting desired for /static/$VERSION/ resources which change within the lifetime of a WMF release branch - https://phabricator.wikimedia.org/T99096#1436937 (10mmodell) @bblack, @dduvall and @krinkle, I think you are all suggesting essentially the s... [08:00:10] 5Continuous-Integration-Isolation, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1436938 (10hashar) Works for me with Trusty. I created a Trusty instance on the integration... [08:03:55] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [08:04:19] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [08:04:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [08:06:05] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [08:07:07] PROBLEM - Puppet failure on deployment-test is CRITICAL 22.22% of data above the critical threshold [0.0] [08:07:26] PROBLEM - Puppet failure on deployment-stream is CRITICAL 30.00% of data above the critical threshold [0.0] [08:08:32] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:08:56] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [08:09:13] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 33.33% of data above the critical threshold [0.0] [08:09:45] RECOVERY - Puppet failure on deployment-mediawiki03 is OK Less than 1.00% above the threshold [0.0] [08:10:27] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 40.00% of data above the critical threshold [0.0] [08:11:19] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 44.44% of data above the critical threshold [0.0] [08:11:47] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 10.00% of data above the critical threshold [43200.0] [08:11:55] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [08:12:00] !log shutdowning Jenkins for upgrade. [08:12:12] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:12:35] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:14:24] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #654: ABORTED in 4 min 24 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/654/ [08:20:33] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [08:27:07] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [08:30:04] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [08:32:08] RECOVERY - Puppet failure on deployment-test is OK Less than 1.00% above the threshold [0.0] [08:32:40] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [08:33:28] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [08:33:32] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [08:34:12] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [08:35:04] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [08:35:34] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [08:36:19] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [08:37:11] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [08:37:13] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [08:37:27] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [08:37:34] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [08:39:54] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:40:04] 5Continuous-Integration-Isolation: Figure out how Jenkins conf is maintained by OpenStack - https://phabricator.wikimedia.org/T95049#1436969 (10hashar) Hey @zeljkofilipin that task has some informations related to OpenStack managing their Jenkins installation via puppet. The puppet module at http://git.opensta... [08:44:26] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 20.00% of data above the critical threshold [0.0] [08:44:38] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 20.00% of data above the critical threshold [0.0] [08:48:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:51:31] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:51:32] PROBLEM - Puppet failure on deployment-urldownloader is CRITICAL 50.00% of data above the critical threshold [0.0] [08:52:17] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 44.44% of data above the critical threshold [0.0] [08:53:11] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:53:35] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 20.00% of data above the critical threshold [0.0] [08:57:36] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:57:36] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:00:34] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1436987 (10MoritzMuehlenhoff) >>! In T103335#1429015, @MoritzMuehlenhoff wrote: > In Debian the De... [09:04:40] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:04:53] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [09:05:31] $ /bin/ls -d /var/lib/jenkins/jobs/*/builds/2015-*|wc -l [09:05:31] 954 [09:05:35] zeljkof: it is progressing :-} [09:06:10] DONE [09:06:21] !log Jenkins registering jobs with Zuul [09:07:13] zeljkof: wanna pair again ? [09:07:16] migration is completed [09:08:07] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:08:31] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:10:13] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 44.44% of data above the critical threshold [0.0] [09:12:40] hashar: sure [09:12:44] joining [09:13:12] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [09:13:54] hashar: in the hangout [09:14:28] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [09:14:34] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [09:16:30] RECOVERY - Puppet failure on deployment-urldownloader is OK Less than 1.00% above the threshold [0.0] [09:17:15] RECOVERY - Puppet failure on deployment-restbase02 is OK Less than 1.00% above the threshold [0.0] [09:18:17] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [09:21:29] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [09:22:33] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [09:22:35] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [09:23:09] RECOVERY - Puppet failure on deployment-logstash2 is OK Less than 1.00% above the threshold [0.0] [09:23:37] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [09:25:22] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #490: ABORTED in 4 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/490/ [09:26:07] !log upgraded plugins on jenkins and restarting it [09:26:15] !log upgraded plugins on jenkins and restarting it [09:26:18] bah [09:26:19] Logged the message, Master [09:28:50] 10Continuous-Integration-Infrastructure, 7Jenkins: Upgrade Jenkins to 1.609.1 - https://phabricator.wikimedia.org/T101884#1437038 (10hashar) 5Open>3Resolved a:3hashar Done with Zeljko. We also upgraded the plugins we know do not cause any issues. [09:29:40] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [09:31:21] !log Jenkins is fully back and operational. Or so should be. [09:35:12] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [09:38:31] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [09:39:41] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:39:55] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:40:19] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:40:43] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:41:21] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:42:05] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:42:55] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:43:51] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:44:55] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 50.00% of data above the critical threshold [0.0] [10:06:26] RECOVERY - Puppet failure on deployment-pdf01 is OK Less than 1.00% above the threshold [0.0] [10:07:06] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [10:07:57] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [10:08:51] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [10:09:43] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [10:09:55] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [10:09:57] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [10:10:21] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [10:10:40] RECOVERY - Puppet failure on deployment-parsoid05 is OK Less than 1.00% above the threshold [0.0] [10:38:14] (03PS1) 10Florianschmidtwelzow: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) [10:39:46] (03PS2) 10Florianschmidtwelzow: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) [10:47:58] addshore: got a minute? [10:48:44] Sure :p [10:48:59] (03CR) 10Krinkle: [C: 04-1] Publish php docs for MobileFrontend (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) (owner: 10Florianschmidtwelzow) [10:50:27] addshore: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#C_borrowings [10:50:32] talks about else if and elseif [10:50:37] a long followed convention [10:50:43] however, I followed the link to the actual PHP docs [10:51:12] "In PHP, you can also write 'else if' (in two words) and the behavior would be identical to the one of 'elseif' (in a single word). The syntactic meaning is slightly different but the bottom line is that both would result in exactly the same behavior." [10:51:17] (03PS3) 10Florianschmidtwelzow: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) [10:51:19] (03CR) 10Florianschmidtwelzow: Publish php docs for MobileFrontend (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) (owner: 10Florianschmidtwelzow) [10:51:30] "Same behaviour" may only mean same control flow [10:51:44] but are we sure that "else if" is less efficient? [10:53:09] I dot actually know off the top of my head :) [10:53:24] I would presume elseif is more efficient than else if [10:54:45] a good interpreter would optimise the difference away [10:55:11] but they do remain separate tokens and either way consistency is more important. And we settled on elseif it seems. [10:55:27] Makes sense (: [11:05:00] Krinkle: addshore: alright! [11:08:57] (03PS1) 10Florianschmidtwelzow: Change trusted e-mail address to cover Gerrit E-Mail change [integration/config] - 10https://gerrit.wikimedia.org/r/223530 [11:38:08] FlorianSW|away: Can you update your local gitconfig to match that address? [11:38:21] The commits you made (inlcuding https://gerrit.wikimedia.org/r/223530) still use the old address [11:41:34] Krinkle: oops, i only looked at the gerrit one, sure, i change it :) [11:41:54] FlorianSW|away: Your gerrit one is private, I can't see that to verify [11:42:13] Krinkle: ah, ok! [11:42:21] And also, that way it'll actually work –Jenkins checks the commit, not gerrit [11:42:46] Gerrit makes sure you only use your own address (for security) - and then Jenkins gets the commit [11:44:14] (03PS2) 10Florianschmidtwelzow: Change trusted e-mail address to cover Gerrit E-Mail change [integration/config] - 10https://gerrit.wikimedia.org/r/223530 [11:44:21] Krinkle: ^ [11:44:47] (03CR) 10Krinkle: [C: 032] Change trusted e-mail address to cover Gerrit E-Mail change [integration/config] - 10https://gerrit.wikimedia.org/r/223530 (owner: 10Florianschmidtwelzow) [11:45:52] great, thanks :) [11:46:33] (03Merged) 10jenkins-bot: Change trusted e-mail address to cover Gerrit E-Mail change [integration/config] - 10https://gerrit.wikimedia.org/r/223530 (owner: 10Florianschmidtwelzow) [11:47:23] Krinkle: does zuul reload the configuration automatically? [11:47:45] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/223530 [11:47:46] Nope [11:47:48] Logged the message, Master [11:47:53] Need to access server, git pull, and reload [11:48:04] ah, nice to know :) [11:48:22] https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Deploy_configuration [11:50:10] (03PS1) 10Hashar: UpdateFlorian Schmidt email in test pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/223533 [11:50:21] (03CR) 10Hashar: [C: 032] UpdateFlorian Schmidt email in test pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/223533 (owner: 10Hashar) [11:50:43] (03CR) 10Hashar: "email gotta be changed in both check and test pipelines. Adjusted with https://gerrit.wikimedia.org/r/223533" [integration/config] - 10https://gerrit.wikimedia.org/r/223530 (owner: 10Florianschmidtwelzow) [11:52:16] (03Merged) 10jenkins-bot: UpdateFlorian Schmidt email in test pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/223533 (owner: 10Hashar) [11:52:21] Krinkle: you see ^? :] [11:53:05] (03CR) 10Florianschmidtwelzow: "Thanks @hashar! :/" [integration/config] - 10https://gerrit.wikimedia.org/r/223530 (owner: 10Florianschmidtwelzow) [12:07:16] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL 100.00% of data above the critical threshold [0.0] [12:48:13] (03PS4) 10Florianschmidtwelzow: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) [12:50:15] (03CR) 10jenkins-bot: [V: 04-1] Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) (owner: 10Florianschmidtwelzow) [13:00:54] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437480 (10hashar) 3NEW [13:02:02] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437487 (10hashar) [13:05:00] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #710: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/710/ [13:12:33] (03CR) 10Florianschmidtwelzow: Publish php docs for MobileFrontend (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) (owner: 10Florianschmidtwelzow) [13:14:34] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<30.00%) [13:17:16] (03PS5) 10Florianschmidtwelzow: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) [13:34:16] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437537 (10hashar) I installed `libguestfs-tools` package on labnodepool1001.eqiad.wmnet to inspect the disk image (puppet is https://gerrit.wikimedia.org/r/223543 , doc added at https://... [13:43:49] 5Continuous-Integration-Isolation, 6operations, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1437558 (10hashar) [13:45:12] 5Continuous-Integration-Isolation, 6operations, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1433508 (10hashar) Added another use case triggered by `guestmount -a image.qcow2 /home/hashar/mount` which produces a /d... [13:51:34] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437575 (10hashar) >>! In T92613#1434543, @zeljkofilipin wrote: > Make sure to check the status of related patch https://gerrit.... [14:00:39] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437582 (10zeljkofilipin) To test the related patches I have made a copy of [[ https://integration.wikimedia.org/ci/job/browsert... [14:17:25] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437634 (10zeljkofilipin) Running only one scenario: bundle exec cucumber features/preferences.feature:4 * master branch (... [14:18:25] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437635 (10hashar) On the same project, a Jessie image provided by labs boots just fine. A difference is that it outputs: ``` Starting LSB: Cloud init modules --mode config... Started LSB... [14:25:31] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437651 (10zeljkofilipin) - Antoine's patch ([[ https://gerrit.wikimedia.org/r/#/c/200767/ | gerrit ]], [[ https://integration.w... [14:26:22] Hello zeljkof ! [14:26:29] hi vikasyaligar [14:26:38] VisualEditor is becoming more and more beautiful :) [14:27:47] vikasyaligar: really? not using it a lot :) [14:28:17] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437680 (10hashar) Then looking at the bootstrapvz files/labs-jessie.manifest.yaml there is a few settings related to cloud init: ``` lang=yaml plugins: cloud_init: username: admin... [14:29:05] zeljkof: Yup ! I am trying it out after lot of time :) ; Trying to run the browser tests now. [14:29:34] vikasyaligar: do you have all the visas now? are you coming to mexico for sure? or is there something left to be done? [14:30:44] zeljkof: My transit VISA is pending :( ; I will apply tomorrow and I should get it back on Monday hopefully. [14:30:57] what is transit visa? [14:31:04] you have mexican visa, right? [14:31:27] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437692 (10hashar) Yup that saves a bunch of roundtrips. Once rebased, there is probably more `href: /regex/` that needs to be... [14:32:04] zeljkof: Yup; But there is no direct flight from India to Mexico. It goes through Germany. Hence I need to take schengen visa to transit from Franfurt to Mexico city [14:32:35] vikasyaligar: I see [14:32:47] but that should be easier, right? [14:32:51] anyway, good luck with that [14:32:59] zeljkof: Yup ! It should be :) [14:33:06] thank you [14:33:18] zeljkof: What about you ? [14:33:41] vikasyaligar: I am all ready to go, I do not need any visas, I have the ticket and the hotel [14:34:04] did wmf pay for your travel/hotel? [14:34:04] zeljkof: Awesome [14:34:10] zeljkof: yes :) [14:34:17] do you already have tickets/hotel? [14:34:45] zeljkof: Yes [14:34:51] great [14:34:58] well, just one more visa to go then :) [14:36:46] zeljkof: hehe ! I hope it is not as painful as Mexican VISA :) [14:41:43] hashar: could you please remove your -2 from this? https://gerrit.wikimedia.org/r/#/c/200767/ [14:41:53] I would like to merge it after testing it [14:42:00] is there a reason for -2 [14:42:01] ? [14:42:10] yeah [14:42:16] so I can review it ? :-D [14:42:24] you can't just blindly rebase that patch [14:42:28] need to look at all the feature files [14:42:45] and replaces all / most href: /regex/ with an id: selector :} [14:42:50] I only changed a few ones iirc [14:47:54] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437744 (10hashar) Maybe the dib image needs to drop the default user being set. I noticed the boot log line: ci-info: no authorized ssh keys fingerprints found for user debian. May... [14:54:20] (03PS1) 10Dzahn: enable voting for pep8/pyflakes for adminbot repo [integration/config] - 10https://gerrit.wikimedia.org/r/223559 [15:00:44] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437808 (10zeljkofilipin) - My patch that upgrades the repo to the latest version of watir-webdriver ([[ https://gerrit.wikimedi... [15:03:29] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1437809 (10zeljkofilipin) Well, I do not know what the conclusion is. Both patches do speed up the scenario, but it still takes... [15:10:12] (03CR) 10Elee: [C: 031] "This would be my first code review, but per Jan's comment on the other change this looks sane." [integration/config] - 10https://gerrit.wikimedia.org/r/223559 (owner: 10Dzahn) [15:13:14] hashar: sorry, just saw your replies [15:14:11] ok, we can create a subtask (probably low priority) to inspect all repos and remove regular expressions where possible [15:14:12] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 66.67% of data above the critical threshold [0.0] [15:14:14] sounds good? [15:14:31] I do not see the reason for your patch not to be merged now, since it does speed up things [15:14:55] and who knows when we will have the time to inspect core, and especially all repos... [15:25:43] zeljkof: gotta inspect the rest of core first though [15:26:00] it is not very long. We can handle it together tomorrow if you want :} [15:26:07] just grep for href :} [15:34:00] hashar: sure [15:34:04] sounds good [15:37:20] hashar: random question [15:37:30] I have upgraded rubocop, but it did not run here https://gerrit.wikimedia.org/r/#/c/223556/ [15:42:41] zeljkof: cause the bundle-rubocop jobs has a file: filter in zuul [15:42:47] so it is only run when a .rb file is modified iirc [15:43:11] - name: '^(.*-)?bundle-rubocop$' [15:43:15] files: [15:43:19] - '^(\.rubocop.*|\.gemspec|.*\.rb|Gemfile$)' [15:43:28] line 552 of zuul/layout.yaml [15:43:44] hashar: yeah, we need to update it to run when Gemfile.lock is changed too [15:43:44] guess it is time to update the regex and the comment [15:43:48] thanks [15:43:49] and have it match Gemfile.lock :} [15:43:55] then once merged and deployed [15:43:57] use recheck [15:44:00] and that should work ™ [15:44:19] Gemfile$) --> Gemfile$|Gemfile.lock$) [15:44:20] hashar: thanks, will try [15:44:21] should do [15:45:15] argh, when ever I touch something, 10 new task fall from the sky :) [15:47:27] where is marxarelli when you need him? ;) [15:48:13] trying to run browser tests for core, getting this [15:48:19] $ bundle exec cucumber features/view_history.feature:4 [15:48:27] unknown environment `default` (MediawikiSelenium::ConfigurationError) [15:50:47] ok, looks like it is jut misconfigured env file [15:50:48] https://github.com/wikimedia/mediawiki-selenium#running-tests [15:54:14] 10Browser-Tests: unknown environment `default` (MediawikiSelenium::ConfigurationError) - https://phabricator.wikimedia.org/T105174#1437967 (10zeljkofilipin) 3NEW a:3zeljkofilipin [16:00:16] hashar: no, recheck did not run rubocop https://gerrit.wikimedia.org/r/#/c/223556/ [16:00:26] will update layout file [16:03:28] 10Continuous-Integration-Infrastructure: Updating repository to a new version of RuboCop does not run RuboCop for the commit - https://phabricator.wikimedia.org/T105178#1438018 (10zeljkofilipin) 3NEW a:3zeljkofilipin [16:04:33] zeljkof: you can just do the config change . Not sure you need a Task for it :} [16:05:13] hashar: I am doing several things at once, if I do not have the tasks for each, there is no way I will remember everything that I need to do :) [16:05:33] and creating the task is just a minute anyway [16:05:53] 10Continuous-Integration-Infrastructure: Updating repository to a new version of RuboCop does not run RuboCop for the commit - https://phabricator.wikimedia.org/T105178#1438032 (10hashar) From IRC: update the zuul/layout.yaml file around line 552: ``` - name: '^(.*-)?bundle-rubocop$' files: - '^(\.rubocop... [16:06:05] zeljkof: doing the conf change is a few seconds anyway :D [16:06:34] hashar: at 6 pm on a hot summer day, my brain is melting :) I need all the help from the tools that I can get [16:07:11] I do not update zuul config a lot, I would like to make some notes [16:11:36] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1438050 (10hashar) The dib image has a `/etc/cloud/cloud.cfg.d/90_dpkg.cfg` with: datasource_list: [ NoCloud, AltCloud, CloudStack, ConfigDrive, Ec2, MAAS, OVF, GCE, None ] http://clo... [16:16:57] (03PS1) 10Zfilipin: Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) [16:17:10] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1438078 (10hashar) Hi labs! For some reason when booting the disk image I created, cloud-init fails to configure the network for some reason. The last image `ci-dib-jessie-wikimedia-143... [16:17:13] (03PS2) 10Zfilipin: Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) [16:17:26] (03CR) 10Zfilipin: "Patch set 2 is just a rebase." [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [16:18:05] hashar: thanks for the tips, done: https://gerrit.wikimedia.org/r/#/c/223574/ [16:24:29] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1438122 (10mmodell) I can imagine a few scenarios where lack of locking/coordination among deployers could cause this to go horribly... [16:26:17] zeljkof: that mw-selenium error is due to a missing `default:` entry in environments.yaml [16:26:30] zeljkof: you just need to add one and alias it something (beta, usually) [16:26:31] marxarelli: yeah, thanks, figured it out [16:26:37] cool [16:26:47] going off for dinner etc [16:26:53] marxarelli: https://gerrit.wikimedia.org/r/#/c/223571/ [16:27:04] my diskimage can't acquire network :-((( https://phabricator.wikimedia.org/T105152 [16:27:05] hashar: dinner is for the weak! ,9 [16:27:06] ;) [16:27:14] hashar: bon appetit [16:27:16] tell that to the kids hhee [16:27:17] merci [16:40:00] * zeljkof-away sneaks out to get some food too ;) [16:49:58] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1438168 (10bd808) >>! In T104826#1438122, @mmodell wrote: > Shouldn't we create some sort of flag / mutex that must be obtained on b... [17:05:17] 10Deployment-Systems: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1438202 (10mmodell) Yes, I agree that it's not a blocker but just something we should consider. Even a mutex with some manual proces... [17:22:05] 10Deployment-Systems: [scap] Add support for a global mutex to keep multiple masters from clobbering each other - https://phabricator.wikimedia.org/T105195#1438312 (10bd808) 3NEW [17:24:13] 10Deployment-Systems: [scap] Add support for a global mutex to keep multiple masters from clobbering each other - https://phabricator.wikimedia.org/T105195#1438337 (10bd808) We might be able to use the new etcd infrastructure for this. Ideally we would setup something that not only keeps multiple masters from ru... [17:25:22] twentyafterfour: the "little" scap changes are quickly moving towards being a real project (rolling restarts, multi-master, global mutex) [17:26:35] bd808: I think the real answer is to integrate scap into our new deployment framework [17:26:51] since we are using python, that should not be difficult to integrate [17:26:54] yes. is that done yet? ;) [17:27:05] bd808: no it's just getting started [17:27:15] *nod* hence the wink [17:27:48] scap it's self is pretty good code, so we may not change it much. But I do want to make a more interactive interface for it [17:29:04] just don't make it so interactive that you can't script it (/me glares at trebuchet) [17:32:08] bd808: interactive mode just gives you more control over the scripted process that will be happening out-of-process [17:32:21] so it'll be more scriptable not less [17:32:53] perfect. It would be nice to be able to do things like kill a hung sync without needing an out-of-band connection [17:33:02] bd808: exactly [17:33:40] https://github.com/jonathanslenders/python-prompt-toolkit [17:33:44] I'll work on the logo update to strap rockets to Scappy's sides [17:45:22] bd808: nice! [17:45:55] or how about "scappy on skates" ? [17:46:13] oh.. that could be the ticket. [17:46:27] lollerskates [17:46:45] flying a roflcopter [18:00:19] :D [18:13:34] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 30.00% of data above the critical threshold [0.0] [18:43:33] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [18:50:43] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1438762 (10dduvall) >>! In T92613#1437809, @zeljkofilipin wrote: > Well, I do not know what the conclusion is. Both patches do s... [18:59:22] I missed the SWAT window for https://gerrit.wikimedia.org/r/#/c/223562/ so Special:Translate is broken on mediawiki&meta. Can I get it deployed soon? [19:02:23] twentyafterfour: ^ ? [19:14:30] Nikerabbit: sure [19:16:20] twentyafterfour: ooh thank you very much [19:18:20] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<33.33%) [19:52:46] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1438931 (10coren) AFAICT, this problem solved itself (as expected) since we switched to a properly... [19:55:49] 10Continuous-Integration-Infrastructure: Re-create ci slaves (March 2015) - https://phabricator.wikimedia.org/T91524#1438943 (10coren) [19:55:52] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1438941 (10coren) 5declined>3Resolved Indeed it has: ```marc@tools-bastion-01:~$ host notexist... [20:01:27] 5Continuous-Integration-Isolation, 6operations, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1433508 (10hashar) [20:15:14] 5Continuous-Integration-Isolation, 6operations, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1438997 (10hashar) The check_disk parameters for production are defined in `modules/base/manifests/monitoring/host.pp`:... [20:15:35] 5Continuous-Integration-Isolation, 6operations, 7Icinga, 7Monitoring, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1439010 (10hashar) [20:23:52] (03CR) 10Hashar: [C: 04-1] "A $ got dropped :-(" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [20:25:40] (03CR) 10Hashar: [C: 032] "Awesome! Well done everyone :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/223559 (owner: 10Dzahn) [20:27:32] (03Merged) 10jenkins-bot: enable voting for pep8/pyflakes for adminbot repo [integration/config] - 10https://gerrit.wikimedia.org/r/223559 (owner: 10Dzahn) [20:28:17] (03CR) 10Hashar: "It is voting now https://gerrit.wikimedia.org/r/#/c/180890/ !" [integration/config] - 10https://gerrit.wikimedia.org/r/223559 (owner: 10Dzahn) [20:58:58] 5Continuous-Integration-Isolation, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1439110 (10Andrew) I've built new images for Trusty and Jessie. Not bothering with Precise... [21:00:59] 5Continuous-Integration-Isolation, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1439141 (10Andrew) 5Open>3Resolved [21:01:33] (03PS1) 10Legoktm: Whitelist Niharika [integration/config] - 10https://gerrit.wikimedia.org/r/223669 [21:02:23] (03CR) 10Legoktm: [C: 032] Whitelist Niharika [integration/config] - 10https://gerrit.wikimedia.org/r/223669 (owner: 10Legoktm) [21:04:12] (03Merged) 10jenkins-bot: Whitelist Niharika [integration/config] - 10https://gerrit.wikimedia.org/r/223669 (owner: 10Legoktm) [21:05:44] !log deployed https://gerrit.wikimedia.org/r/223669 [21:05:49] Logged the message, Master [21:15:41] (03PS1) 10Hoo man: Update Wikidata to the wmf/1.26wmf13 branch [tools/release] - 10https://gerrit.wikimedia.org/r/223674 [21:20:02] (03CR) 10BryanDavis: [C: 032] Update Wikidata to the wmf/1.26wmf13 branch [tools/release] - 10https://gerrit.wikimedia.org/r/223674 (owner: 10Hoo man) [21:27:34] (03Merged) 10jenkins-bot: Update Wikidata to the wmf/1.26wmf13 branch [tools/release] - 10https://gerrit.wikimedia.org/r/223674 (owner: 10Hoo man) [21:31:39] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1439272 (10Andrew) Labs images mostly use firstboot.sh and don't rely on cloud-init very much. That means I don't have much to offer about cloud-init failures or debugging. We don't sup... [21:34:07] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1439275 (10Andrew) oh, I'd expect the network to come up before cloud-init does anything, though -- that should be dhcp's job. [22:00:01] !log Kibana messed up. Half of the logstash elasticsearch indices are gone from deployment-logstash2 [22:00:06] Logged the message, Master [22:00:29] 10Deployment-Systems, 6Release-Engineering, 10RESTBase, 6Services, 6operations: Get ops feedback regarding the use of SSH for deployment system control channel. - https://phabricator.wikimedia.org/T102687#1439388 (10thcipriani) I poked at Fabric a bit this morning. Fabric uses paramiko which doesn't app... [22:32:03] !log Upgraded elasticsearch on logstash2 to 1.6.0 [22:32:08] Logged the message, Master [22:33:06] !log about half of the indices on deployment-logstash2 lost. I assume it was caused by shard rebalancing to logstash1 that I didn't notice before I shut it down and deleted it :( [22:33:10] Logged the message, Master [22:36:04] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1439469 (10hashar) My expectation as well. Maybe we can snapshot the instance and download the image to inspect the cloud debug log somewhere under /var/log ? Will give it a try. There... [22:37:07] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1439472 (10scfc) Too bad for the readers Google will bring here in the future: Nearly four months o... [22:48:31] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/223691/ on integration-puppetmaster [22:48:36] Logged the message, Master [23:09:03] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1439571 (10Tgr) >>! In T102566#1434294, @Tau wrote: > Still nothing ... Any further recommendations?? Make sure [[... [23:17:52] !log Kibana functional again. Imported some dashboards from prod instance. [23:17:56] Logged the message, Master [23:31:43] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL 100.00% of data above the critical threshold [0.0]