[00:11:21] 3Beta-Cluster: Puppet failures on deployment-bastion - https://phabricator.wikimedia.org/T75520#845262 (10Andrew) The deployment-bastion node directly includes the 'keyholder' class via wikitech. That class now takes an argument ('trusted_group') but no arg is specified, hence the error. You'll need to either... [00:12:23] PROBLEM - Puppet staleness on deployment-apertium01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [00:20:23] PROBLEM - Puppet staleness on deployment-rsync01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [00:26:16] (03CR) 10Krinkle: [C: 04-1] "If they're for "dev scripts" only (whatever that is), then the problem is in MobileFrontend. By default npm-test does nothing. Whatever it" [integration/config] - 10https://gerrit.wikimedia.org/r/179345 (owner: 10MaxSem) [00:32:34] PROBLEM - Puppet staleness on deployment-memc04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [00:34:18] (03CR) 10Dduvall: "recheck" [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179375 (owner: 10Dduvall) [00:35:36] Project beta-scap-eqiad build #33649: FAILURE in 1 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33649/ [00:37:04] (03CR) 10Dduvall: "Looks like the yard job might have failed intermittently. How do I tell Jenkins to try again?" [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179375 (owner: 10Dduvall) [00:50:12] (03CR) 10Dduvall: "Spoke too soon. :)" [selenium] (env-abstraction-layer) - 10https://gerrit.wikimedia.org/r/179375 (owner: 10Dduvall) [00:57:32] Yippee, build fixed! [00:57:32] Project beta-scap-eqiad build #33651: FIXED in 3 min 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33651/ [01:07:25] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<44.44%) [01:30:15] 3Continuous-Integration, RESTBase, Services: Move testing to our own hardware - https://phabricator.wikimedia.org/T78410#845361 (10GWicke) [01:31:05] 3Continuous-Integration, Parsoid, RESTBase, Services: Move testing to our own hardware - https://phabricator.wikimedia.org/T78410#844577 (10GWicke) [01:31:24] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:37:18] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:42:39] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:44:25] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:47:05] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:50:03] Yippee, build fixed! [01:50:04] Project browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce build #359: FIXED in 1 hr 31 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce/359/ [01:50:04] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:55:35] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:01:25] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:05:23] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #335: FAILURE in 9 min 8 sec: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/335/ [02:07:15] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #285: FAILURE in 1 min 51 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/285/ [02:15:06] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:20:39] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:22:13] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:29:26] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:32:04] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:32:40] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:04:10] PROBLEM - Puppet failure on deployment-cache-mobile03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:11:19] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [04:24:56] Project beta-scap-eqiad build #33674: FAILURE in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33674/ [04:33:20] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #186: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/186/ [05:26:05] Yippee, build fixed! [05:26:06] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #351: FIXED in 20 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/351/ [05:36:33] Yippee, build fixed! [05:36:34] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #412: FIXED in 1 hr 3 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/412/ [06:37:09] Yippee, build fixed! [06:37:10] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #171: FIXED in 57 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/171/ [06:37:27] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [06:59:47] (03CR) 1020after4: [C: 031] "this looks good to me but I am not that familiar with the testing infrastructure so I'm not the best one to review." [integration/config] - 10https://gerrit.wikimedia.org/r/178862 (owner: 10Hashar) [07:48:23] Yippee, build fixed! [07:48:24] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #380: FIXED in 56 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/380/ [07:49:51] Yippee, build fixed! [07:49:52] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #279: FIXED in 1 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/279/ [07:52:03] Yippee, build fixed! [07:52:04] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #61: FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/61/ [08:02:49] Yippee, build fixed! [08:02:49] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #358: FIXED in 10 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/358/ [08:21:24] !log forcing puppet run on all deployment-prep hosts [08:21:29] Logged the message, Master [08:35:26] Yippee, build fixed! [08:35:27] Project beta-scap-eqiad build #33699: FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33699/ [08:36:10] Yippee, build fixed! [08:36:10] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #315: FIXED in 4 min 42 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/315/ [08:36:41] RECOVERY - Puppet staleness on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:36:55] RECOVERY - Puppet staleness on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:37:57] RECOVERY - Puppet staleness on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:38:19] RECOVERY - Puppet staleness on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:39:12] RECOVERY - Puppet staleness on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [3600.0] [08:40:32] (03PS2) 10Zfilipin: Add browsers for CentralNotice [integration/config] - 10https://gerrit.wikimedia.org/r/178714 (owner: 10AndyRussG) [08:42:35] RECOVERY - Puppet staleness on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:42:57] RECOVERY - Puppet staleness on deployment-db2 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:44:28] 3Beta-Cluster: Beta servers can be badly misconfigured if mwyaml hiera backend fails - https://phabricator.wikimedia.org/T78408#845594 (10yuvipanda) I guess puppet should fail if it can't hit wikitech. [08:45:19] RECOVERY - Puppet staleness on deployment-stream is OK: OK: Less than 1.00% above the threshold [3600.0] [08:45:25] RECOVERY - Puppet staleness on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:47:43] RECOVERY - Puppet staleness on deployment-salt is OK: OK: Less than 1.00% above the threshold [3600.0] [08:47:57] RECOVERY - Puppet staleness on deployment-upload is OK: OK: Less than 1.00% above the threshold [3600.0] [08:48:50] RECOVERY - Puppet staleness on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [3600.0] [08:49:14] Yippee, build fixed! [08:49:15] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #223: FIXED in 13 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/223/ [08:55:06] RECOVERY - Puppet staleness on deployment-cache-mobile03 is OK: OK: Less than 1.00% above the threshold [3600.0] [09:02:31] 3Release-Engineering, Phabricator: Answer questions about ongoing maintenance of phabricator customizations/extensions - https://phabricator.wikimedia.org/T78464#845596 (10mmodell) 3NEW a:3mmodell [09:29:27] 3Beta-Cluster: Upgrade varnish automatically via puppet in Beta Cluster - https://phabricator.wikimedia.org/T75564#845615 (10mmodell) Ok I've added version: latest to [[ https://wikitech.wikimedia.org/wiki/Heira:deployment-prep | hiera ]] on wikitech.I'll look into what it will take to deal with restarting after... [09:35:04] twentyafterfour: ^ typo, it’s hire, your page is at heira :) [09:35:07] err [09:35:08] hiera [09:35:53] 3Beta-Cluster: Upgrade varnish automatically via puppet in Beta Cluster - https://phabricator.wikimedia.org/T75564#845616 (10yuvipanda) ^ typo, should be Hiera: not Heira: :) [09:38:31] YuviPanda: thanks, good catch [09:38:37] fixed [09:40:06] 3Beta-Cluster: Upgrade varnish automatically via puppet in Beta Cluster - https://phabricator.wikimedia.org/T75564#845618 (10mmodell) >>! In T75564#845616, @yuvipanda wrote: > ^ typo, should be Hiera: not Heira: :) Thanks! typo is now fixed, good catch! [09:50:30] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #182: ABORTED in 42 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/182/ [09:50:32] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #421: ABORTED in 53 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/421/ [09:58:41] Yippee, build fixed! [09:58:42] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #286: FIXED in 2 min 8 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/286/ [10:19:26] Yippee, build fixed! [10:19:26] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #336: FIXED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/336/ [10:44:06] 3Phabricator, Release-Engineering: Answer questions about ongoing maintenance of phabricator customizations/extensions - https://phabricator.wikimedia.org/T78464#845649 (10Qgil) Thank you! All this is food for https://www.mediawiki.org/wiki/Phabricator/Code. Let's concentrate all the documentation there. One bi... [10:48:16] 3Beta-Cluster, Labs-Team: Setup multimaster salt for large projects using salt-syndic - https://phabricator.wikimedia.org/T78466#845659 (10yuvipanda) 3NEW [11:00:28] Yippee, build fixed! [11:00:29] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #187: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/187/ [11:58:33] Project browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce build #221: FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-test2.wikipedia.org-linux-firefox-sauce/221/ [12:25:49] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [12:47:10] 3Services, Continuous-Integration, Parsoid, RESTBase: Move testing to our own hardware - https://phabricator.wikimedia.org/T78410#845955 (10hashar) [12:50:46] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [12:53:49] 3Services, Continuous-Integration, Parsoid, RESTBase: Move testing to our own hardware - https://phabricator.wikimedia.org/T78410#845956 (10hashar) I have removed the Jenkins tag since it is meant for Tasks that affect Jenkins itself. Continuous-Integration would be enough. We have Trusty instances in labs tha... [13:15:11] Project beta-scap-eqiad build #33726: FAILURE in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33726/ [13:30:09] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:32:50] (03CR) 10Hashar: "> this looks good to me but I am not that familiar with the testing infrastructure so I'm not the best one to review." (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/178862 (owner: 10Hashar) [13:35:14] Yippee, build fixed! [13:35:15] Project beta-scap-eqiad build #33728: FIXED in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33728/ [13:55:09] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [14:35:16] Project beta-scap-eqiad build #33734: FAILURE in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33734/ [14:43:09] Yippee, build fixed! [14:43:10] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #85: FIXED in 26 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/85/ [14:55:23] Yippee, build fixed! [14:55:23] Project beta-scap-eqiad build #33736: FIXED in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33736/ [15:46:53] Project beta-scap-eqiad build #33741: FAILURE in 2 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33741/ [15:50:14] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:50:16] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:52:24] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:53:52] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [15:55:02] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:57:09] Yippee, build fixed! [15:57:09] Project beta-scap-eqiad build #33742: FIXED in 3 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33742/ [15:57:33] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:58:13] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:58:19] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:00:09] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [16:01:38] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [16:02:06] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:02:59] !log Starting work on [[phab:T78076]] to renumber apache users in beta [16:03:01] Logged the message, Master [16:04:06] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [16:04:22] PROBLEM - Puppet failure on deployment-restbase03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:06:34] Project beta-scap-eqiad build #33743: FAILURE in 2 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33743/ [16:08:04] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:09:15] !log apache and hhvm stopped on beta app server tier. All requests expected to return 503 from varnish [16:09:18] Logged the message, Master [16:09:59] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 3024 bytes in 0.007 second response time [16:11:42] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:13:30] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:13:54] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:15:04] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 49065 bytes in 7.880 second response time [16:16:27] 3Phabricator, Release-Engineering: Answer questions about ongoing maintenance of phabricator customizations/extensions - https://phabricator.wikimedia.org/T78464#846063 (10MZMcBride) Thank you for writing this. I find the view that we're outright rejecting customizations to Phabricator to be simply unacceptable... [16:16:53] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:16:59] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:18:29] 3Services, Continuous-Integration, Parsoid, RESTBase: Move testing to our own hardware - https://phabricator.wikimedia.org/T78410#846077 (10GWicke) @hashar, we talked about the issues around race conditions, port conflicts etc before. Right now it might be simpler for us to spin up an LXC container and run npm t... [16:18:33] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:18:49] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:21:47] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:23:00] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:23:21] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:23:22] !log apache and hhvm restarted on beta app servers following apache user renumber [16:23:24] Logged the message, Master [16:23:33] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [16:25:11] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:25:27] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:26:37] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:27:19] PROBLEM - Puppet failure on deployment-parsoid04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:27:41] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:27:47] Yippee, build fixed! [16:27:47] Project beta-scap-eqiad build #33745: FIXED in 3 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33745/ [16:31:02] !log apache user renumbered on deployment-mediawiki03 [16:31:04] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [16:31:04] Logged the message, Master [16:32:08] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [16:34:24] RECOVERY - Puppet failure on deployment-restbase03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:35:16] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [16:35:16] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [16:36:24] Project beta-scap-eqiad build #33746: FAILURE in 2 min 27 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33746/ [16:37:36] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:41:47] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:42:29] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:43:29] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:43:35] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [16:43:41] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:46:34] Yippee, build fixed! [16:46:34] Project beta-scap-eqiad build #33747: FIXED in 2 min 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33747/ [16:47:36] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [16:48:14] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:48:34] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:49:12] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:51:16] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:53:04] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [16:53:26] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:54:20] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:57:18] RECOVERY - Puppet failure on deployment-parsoid04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:57:23] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:58:20] !log restarted puppetmaster on deployment-salt [16:58:22] Logged the message, Master [16:58:32] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:58:54] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:59:34] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:01:10] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:01:52] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:01:56] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:05:01] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [17:06:17] Project beta-scap-eqiad build #33749: FAILURE in 2 Minuten 18 Sekunden: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33749/ [17:06:51] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [17:08:19] PROBLEM - Puppet failure on deployment-parsoid04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:10:35] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:11:06] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:11:12] !log Labs DNS seems to be flaking out badly and causing random scap and puppet failures [17:11:14] Logged the message, Master [17:11:18] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:12:41] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [17:12:55] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:12:55] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:13:05] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:13:31] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [17:13:39] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:14:13] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:14:27] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:15:21] Yippee, build fixed! [17:15:22] Project beta-scap-eqiad build #33750: FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33750/ [17:16:12] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:28] RECOVERY - Puppet failure on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:23] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:37] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:24:19] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [17:24:33] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [17:26:11] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:29:05] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [17:30:27] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:33:21] RECOVERY - Puppet failure on deployment-parsoid04 is OK: OK: Less than 1.00% above the threshold [0.0] [17:34:54] 3Beta-Cluster: Renumber apache user/group to uid=48 on Trusty beta hosts - https://phabricator.wikimedia.org/T78076#846219 (10bd808) Renumbering is done and the puppet patch is applied on deployment-salt via cherry-pick. [17:36:06] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:06] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:16] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:29] 3Beta-Cluster: File upload area resorts to 0777 permissions to for uploaded content - https://phabricator.wikimedia.org/T75206#846224 (10bd808) [17:37:31] 3Beta-Cluster: Renumber apache user/group to uid=48 on Trusty beta hosts - https://phabricator.wikimedia.org/T78076#846220 (10bd808) 5Open>3stalled Help from #operations is needed to merge the patch into {rOPUP} before this can be closed. @yuvipanda and @andrew have been added as reviewers. [17:37:54] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:56] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:26] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:53] 3Beta-Cluster: Renumber apache user/group to uid=48 on Trusty beta hosts - https://phabricator.wikimedia.org/T78076#835083 (10bd808) [17:40:36] RECOVERY - Puppet failure on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [17:47:03] 3Beta-Cluster: Change ownership and permissions of /data/project/upload7 - https://phabricator.wikimedia.org/T78473#846234 (10bd808) 3NEW a:3bd808 [17:47:59] 3Beta-Cluster: Change ownership and permissions of /data/project/upload7 - https://phabricator.wikimedia.org/T78473#846234 (10bd808) [17:51:22] !log Running chown -R apache:apache on /data/project/upload7 from deployment-mediawiki02 [17:51:23] Logged the message, Master [18:14:09] 3Continuous-Integration: Broken mediawiki-phpunit job prevents all merges to mediawiki/core - https://phabricator.wikimedia.org/T78474#846248 (10matmarex) 3NEW [18:14:18] who broke it :( [18:16:42] !log chown done for /data/project/upload7 [18:16:44] Logged the message, Master [18:18:07] MatmaRex: Are all the failures running on gallium? [18:18:46] bd808: no idea, i just noticed that i tried to merge two changes and they failed [18:20:25] My guess is that the git clones on gallium are broken somehow. Similar things have happened before. [18:25:02] !log Running chmod -R u=rwX,g=rwX,o=rX /data/project/upload7 from deployment-mediawiki02 [18:25:04] Logged the message, Master [18:26:18] Yippee, build fixed! [18:26:18] Project browsertests-Echo-test2.wikipedia.org-linux-chrome-sauce build #223: FIXED in 20 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-test2.wikipedia.org-linux-chrome-sauce/223/ [18:31:47] 3Continuous-Integration: Broken mediawiki-phpunit job prevents all merges to mediawiki/core - https://phabricator.wikimedia.org/T78474#846256 (10bd808) On gallium, /srv/ssd/jenkins-slave/workspace/mediawiki-phpunit/src/.git/config is an empty file with a timestamp of 2014-12-13T18:08. [18:38:15] 3Continuous-Integration: Broken mediawiki-phpunit job prevents all merges to mediawiki/core - https://phabricator.wikimedia.org/T78474#846258 (10bd808) I populated the .git/config from the matching file on lanthanum: ``` [core] repositoryformatversion = 0 filemode = true bare = false... [18:41:36] Yippee, build fixed! [18:41:36] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #187: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/187/ [18:47:22] 3Continuous-Integration: Broken mediawiki-phpunit job prevents all merges to mediawiki/core - https://phabricator.wikimedia.org/T78474#846267 (10bd808) Seems fixed by my manual .git/config change (). It might be worth while for @hashar to t... [18:47:50] MatmaRex: ^ I think I fixed it, but have no idea what would have messed the git clone up in the first place. [18:50:06] bd808: <3 [18:51:22] !log Running chmod -R g+s /data/project/upload7 on deploymnet-mediawiki02 [18:51:24] Logged the message, Master [19:35:16] Project beta-scap-eqiad build #33764: FAILURE in 1 分 13 秒: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33764/ [19:37:42] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:38:42] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:55:05] Yippee, build fixed! [19:55:05] Project beta-scap-eqiad build #33766: FIXED in 1 分 5 秒: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33766/ [20:08:24] 3Continuous-Integration: Broken mediawiki-phpunit job prevents all merges to mediawiki/core - https://phabricator.wikimedia.org/T78474#846305 (10bd808) p:5Unbreak!>3Normal Seems to be fixed for now, so lowering status from UBN! to normal. [20:09:40] 3Continuous-Integration: Empty .git/config for mediawiki/core.git clone in mediawiki-phpunit workspace on gallium - https://phabricator.wikimedia.org/T78474#846307 (10bd808) [20:12:48] 3Beta-Cluster: File upload area resorts to 0777 permissions to for uploaded content - https://phabricator.wikimedia.org/T75206#846311 (10bd808) [20:12:49] 3Beta-Cluster: Change ownership and permissions of /data/project/upload7 - https://phabricator.wikimedia.org/T78473#846309 (10bd808) 5Open>3Resolved Ran these commands from deployment-mediawiki02 to change shared image directory file permissions: ``` sudo chown -R apache:apache /data/project/upload7 sudo ch... [20:55:11] Project beta-scap-eqiad build #33772: FAILURE in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33772/ [21:15:47] Yippee, build fixed! [21:15:48] Project beta-scap-eqiad build #33774: FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33774/ [21:39:34] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:40:36] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:41:12] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #183: STILL FAILING in 53 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/183/ [21:45:08] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [21:46:28] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:47:35] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:48:42] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:49:06] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:49:52] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:52:07] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:55:24] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:55:30] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:56:21] 3Beta-Cluster: File upload area resorts to 0777 permissions to for uploaded content - https://phabricator.wikimedia.org/T75206#846406 (10bd808) I think the only remaining work for this will be merging into {rOPUP} to ensure that future beta hosts create the apache user wit... [21:58:00] 3Phabricator, Release-Engineering: Answer questions about ongoing maintenance of phabricator customizations/extensions - https://phabricator.wikimedia.org/T78464#846408 (10scfc) If we discard the weekly updates and adopt a schedule similar to our Bugzilla instance, the maintenance cost will be much lower. [21:58:04] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:58:36] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:01:46] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [22:04:35] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:26] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #422: STILL FAILING in 1 hr 18 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/422/ [22:05:35] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:22] PROBLEM - Puppet failure on deployment-parsoid04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:09:50] PROBLEM - Puppet failure on deployment-eventlogging02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:11:28] RECOVERY - Puppet failure on deployment-cache-text02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:09] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:12:18] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [22:12:27] Project beta-scap-eqiad build #33777: FAILURE in 27 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33777/ [22:13:41] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:09] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:17:06] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:24] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [22:18:35] 3Beta-Cluster: deployment-videoscaler01.eqiad.wmflabs returned [255]: Warning: Permanently added 'deployment-videoscaler01.eqiad.wmflabs,10.68.16.211' (RSA) to the list of known hosts. - https://phabricator.wikimedia.org/T76907#846412 (10bd808) [22:20:27] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:20:33] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:25:23] PROBLEM - Puppet failure on deployment-restbase03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [22:28:38] RECOVERY - Puppet failure on deployment-cache-bits01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:30:04] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [22:31:13] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:31:35] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:32:36] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:33:04] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:33:54] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:34:20] RECOVERY - Puppet failure on deployment-parsoid04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:37:07] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [22:39:16] PROBLEM - Puppet failure on deployment-cache-upload02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:39:56] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:41:09] PROBLEM - Puppet failure on deployment-rsync01 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [22:43:26] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [22:44:40] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:44:59] Yippee, build fixed! [22:44:59] Project beta-scap-eqiad build #33778: FIXED in 28 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33778/ [22:45:18] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:45:32] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [22:48:36] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:49:20] Project beta-scap-eqiad build #33779: FAILURE in 2 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33779/ [22:49:40] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:50:06] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [22:50:14] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:50:52] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:55:04] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:12] Yippee, build fixed! [22:55:12] Project beta-scap-eqiad build #33780: FIXED in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/33780/ [22:56:11] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:18] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [22:59:50] RECOVERY - Puppet failure on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:24] RECOVERY - Puppet failure on deployment-restbase03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:28] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:01:32] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [23:01:37] RECOVERY - Puppet failure on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [23:03:04] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [23:04:14] RECOVERY - Puppet failure on deployment-cache-upload02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:06:44] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [23:13:35] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:14:41] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:14:41] RECOVERY - Puppet failure on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [23:15:52] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:18:05] RECOVERY - Puppet failure on deployment-memc02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:18:57] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:20:13] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:21:09] RECOVERY - Puppet failure on deployment-rsync01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:21:28] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:21:34] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [23:42:25] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:46:29] PROBLEM - Free space - all mounts on deployment-cache-upload02 is CRITICAL: CRITICAL: deployment-prep.deployment-cache-upload02.diskspace._srv_vdb.byte_percentfree.value (<100.00%) [23:47:24] YuviPanda: ^ that doesn't sound good [23:47:52] shinken-wm: the spamming? [23:47:54] what fills up the varnish server? [23:48:01] bd808: aaah, that. [23:48:19] bd808: soo… varnish sets its vdb files to take up 90% of the volume [23:48:24] and we have critical set to 85% [23:48:29] ha [23:48:43] so I think the proper solution here is to allow custom overrides [23:48:48] which I haven’t managed to build yet [23:48:56] too many things [23:48:57] distrated this week by labsdb auditing, mostly [23:48:59] yeah [23:49:06] * bd808 understands [23:49:23] and today I’ve been seized by a sudden desire to learn a new language / ‘thing' [23:49:51] which is probably going to last till I fall asleep, at best