[00:14:47] Project beta-scap-eqiad build #35353: FAILURE in 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35353/ [00:25:04] Project beta-scap-eqiad build #35354: STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35354/ [00:32:35] Yippee, build fixed! [00:32:36] Project beta-scap-eqiad build #35355: FIXED in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35355/ [02:00:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 8.748 second response time [02:04:12] !log restarted hhvm on mediawiki03 [02:06:10] PROBLEM - Free space - all mounts on deployment-mediawiki03 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki03.diskspace._var.byte_percentfree.value (<62.50%) [02:06:20] 3Beta-Cluster: monitor that application servers are responding - https://phabricator.wikimedia.org/T54867#943532 (10yuvipanda) http://shinken.wmflabs.org/host/deployment-mediawiki03 :D So this adds monitoring for bits (requests a static image), and the enwiki main page (checks if the string 'Wikipedia' exists).... [02:27:46] 3Beta-Cluster: Setup monitoring for Beta cluster - https://phabricator.wikimedia.org/T53497#943538 (10yuvipanda) [02:27:47] 3Beta-Cluster: monitor that application servers are responding - https://phabricator.wikimedia.org/T54867#943536 (10yuvipanda) 5Open>3Resolved Resolving since the app servers have monitoring now. [02:28:19] 3Beta-Cluster: Setup monitoring for Beta cluster - https://phabricator.wikimedia.org/T53497#581090 (10yuvipanda) These are in shinken.wmflabs.org now. Further things to be monitored should be filed as subtasks of this. [03:39:36] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #211: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/211/ [04:03:02] Yippee, build fixed! [04:03:03] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #436: FIXED in 46 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/436/ [06:35:04] Project beta-scap-eqiad build #35392: FAILURE in 1 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35392/ [06:37:12] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #210: FAILURE in 38 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/210/ [06:42:32] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:44:55] Project beta-scap-eqiad build #35393: STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35393/ [06:52:13] Yippee, build fixed! [06:52:13] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #359: FIXED in 15 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/359/ [06:54:28] Yippee, build fixed! [06:54:29] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #365: FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/365/ [06:55:25] Yippee, build fixed! [06:55:26] Project beta-scap-eqiad build #35394: FIXED in 1 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35394/ [07:12:33] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:38:08] PROBLEM - Free space - all mounts on deployment-mediawiki02 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki02.diskspace._var.byte_percentfree.value (<25.00%) [07:51:30] PROBLEM - Free space - all mounts on deployment-mediawiki01 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki01.diskspace._var.byte_percentfree.value (<55.56%) [09:36:21] Project beta-scap-eqiad build #35410: FAILURE in 2 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35410/ [09:37:22] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:38:15] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:38:41] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:38:45] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:38:47] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:40:39] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:41:55] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:43:43] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:43:59] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:44:41] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:47:13] Project beta-scap-eqiad build #35411: STILL FAILING in 3 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35411/ [09:48:35] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:49:58] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [09:51:00] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:51:22] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:51:43] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:52:43] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:52:45] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:55:19] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:55:33] Yippee, build fixed! [09:55:34] Project beta-scap-eqiad build #35412: FIXED in 1 min 25 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35412/ [10:02:25] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:18] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:40] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:42] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:05:40] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [10:06:56] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [10:09:44] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:13:57] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:24] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:44] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:42] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [10:17:46] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:19:59] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [10:20:19] RECOVERY - Puppet failure on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:01] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:23:49] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:28:44] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [10:33:32] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:44:46] !log cleaed out cores on deployment-mediawiki03 [12:50:24] 3Beta-Cluster, operations, Labs-Team: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#943783 (10yuvipanda) betalabs' mediawiki-* instances filled up again, and I've set them to be dumped on NFS https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ADeployment-prep&diff=13927... [12:54:23] !log cleaned out cores on deployment-mediawiki01,02 [12:54:59] Project beta-scap-eqiad build #35430: FAILURE in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35430/ [12:56:12] RECOVERY - Free space - all mounts on deployment-mediawiki03 is OK: OK: All targets OK [13:03:10] RECOVERY - Free space - all mounts on deployment-mediawiki02 is OK: OK: All targets OK [13:05:02] Project beta-scap-eqiad build #35431: STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35431/ [13:06:29] RECOVERY - Free space - all mounts on deployment-mediawiki01 is OK: OK: All targets OK [13:15:31] Yippee, build fixed! [13:15:31] Project beta-scap-eqiad build #35432: FIXED in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35432/ [14:53:34] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:40:07] Yippee, build fixed! [18:40:07] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #212: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/212/ [18:51:56] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #377: FAILURE in 6 min 59 sec: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/377/ [18:59:45] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:14:49] Project beta-scap-eqiad build #35467: FAILURE in 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35467/ [19:24:41] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:26:22] Project beta-scap-eqiad build #35468: STILL FAILING in 2 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35468/ [19:30:41] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:31:27] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:33:39] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:33:47] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:34:49] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [19:35:44] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:36:40] Project beta-scap-eqiad build #35469: STILL FAILING in 2 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35469/ [19:37:55] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:38:04] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #196: FAILURE in 46 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/196/ [19:38:24] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:39:42] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:40:14] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:41:38] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:43:40] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:44:06] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:46:19] Yippee, build fixed! [19:46:20] Project beta-scap-eqiad build #35470: FIXED in 2 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35470/ [19:47:32] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:49:32] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:49:56] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:50:28] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:51:00] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:51:58] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:52:20] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:52:42] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:54:31] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:56:08] Project beta-scap-eqiad build #35471: FAILURE in 2 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35471/ [19:56:19] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #404: FAILURE in 51 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/404/ [20:00:40] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:00:42] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:46] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:01:26] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:20] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #490: FAILURE in 56 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/490/ [20:03:24] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [20:04:47] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:30] Yippee, build fixed! [20:05:31] Project beta-scap-eqiad build #35472: FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/35472/ [20:06:39] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:01] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:33] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:10:13] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [20:13:43] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:58] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:15:29] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:16:01] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:22] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:32] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:42] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:18:44] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:19:32] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:20:44] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:58] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:22:57] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [20:24:43] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:25:47] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:37] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:30:45] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:29] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #366: FAILURE in 23 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/366/ [23:46:30] PROBLEM - Free space - all mounts on deployment-cache-upload02 is CRITICAL: CRITICAL: deployment-prep.deployment-cache-upload02.diskspace._srv_vdb.byte_percentfree.value (<100.00%) [23:49:04] <100.00%? Am I missing something? [23:54:48] Krenair: no, the messages are just… terribly formatted. [23:54:56] and that sign is just badly placed... [23:54:58] ignore that. [23:55:56] If the percent free of disk space was >= 100% something would probably be wrong :)