[00:20:07] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:22:55] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:11:34] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3148691 (10Paladox) Happended again on the 02/04/17 bst time. PROBLEM - configured eth on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:05... [01:16:05] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3148692 (10Paladox) a minute later after reporting the recovery errors, the problem thing came again PROBLEM - Check size of conntrack table on cobalt is CRITI... [01:22:18] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3148694 (10demon) None of the last two comments are related to the issue here. That sounds like icinga flapping, not the machine or service itself. [01:22:50] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3148695 (10Paladox) >>! In T148478#3148694, @demon wrote: > None of the last two comments are related to the issue here. That sounds like icinga flapping, not the machine or... [06:18:56] Yippee, build fixed! [06:18:57] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #318: 09FIXED in 1 hr 38 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/318/ [08:16:13] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3148796 (10hashar) Machine had a load spike at 1:00am. It shows high disk IOPS since 1:00 and the disk utilisation largely exploded. There is 35-45% CPU usage for `md1_raid... [10:20:16] 06Release-Engineering-Team, 10MediaWiki-Authentication-and-authorization, 05MW-1.27-release-notes, 05MW-1.28-release-notes, and 3 others: Jenkins Browser tests for Wikibase/Popups etc are failing: Invalid CSRF token in Selenium browser - https://phabricator.wikimedia.org/T160519#3148876 (10Ciencia_Al_Poder... [13:03:45] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:19:45] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [13:46:31] Yippee, build fixed! [13:46:32] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #355: 09FIXED in 2 min 31 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/355/ [15:44:49] Yippee, build fixed! [15:44:49] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #378: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/378/ [15:52:57] Yippee, build fixed! [15:52:57] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #378: 09FIXED in 30 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/378/ [20:31:06] bd808: do you still need that I check it? [20:43:53] PROBLEM - Puppet run on saucelabs-03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:23:52] RECOVERY - Puppet run on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0]