[02:22:51] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:24:51] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [02:59:51] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:00:51] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [03:03:54] 10Quarry, 07Community-Wishlist-Survey-2016: "Running" query being displayed as unsubmitted - https://phabricator.wikimedia.org/T71176#2909387 (10Liuxinyu970226) [03:03:58] 10Quarry, 07Community-Wishlist-Survey-2016: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#2909388 (10Liuxinyu970226) [03:04:02] 10Quarry, 07Community-Wishlist-Survey-2016: Quarry task running for a while - https://phabricator.wikimedia.org/T133738#2909389 (10Liuxinyu970226) [03:04:06] 10Quarry, 07Community-Wishlist-Survey-2016: Wrong status of queries in Recent Queries list - https://phabricator.wikimedia.org/T137517#2909390 (10Liuxinyu970226) [03:04:10] 10Quarry, 07Community-Wishlist-Survey-2016: Query runs over 5 hours without being killed - https://phabricator.wikimedia.org/T139162#2909391 (10Liuxinyu970226) [04:33:51] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:34:51] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [06:46:51] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:51] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [10:23:21] PROBLEM - Throughput of EventLogging EventError events on graphite1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [30.0] [10:26:21] RECOVERY - Throughput of EventLogging EventError events on graphite1001 is OK: OK: Less than 15.00% above the threshold [20.0] [10:29:21] PROBLEM - Throughput of EventLogging EventError events on graphite1001 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [30.0] [10:50:58] hello Event Logging, happy new year to you too! [10:52:16] so it seems an error with RecentChanges - (u'0' is not of type 'integer') [10:53:21] seems for he.wikipedia.org [11:04:32] I have no idea how to fix it :D [11:09:46] ok sent an email [11:20:06] 10Analytics, 10Analytics-EventLogging: EventLogging fails to validate a Recentchanges event for he.wikipedia.org - https://phabricator.wikimedia.org/T154395#2909660 (10elukey) [11:20:16] 10Analytics, 10Analytics-EventLogging: EventLogging fails to validate a Recentchanges event for he.wikipedia.org - https://phabricator.wikimedia.org/T154395#2909672 (10elukey) p:05Triage>03Normal [11:21:50] ACKNOWLEDGEMENT - Throughput of EventLogging EventError events on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [30.0] Elukey https://phabricator.wikimedia.org/T154395 [11:35:42] going afk, will check later! [11:44:21] RECOVERY - Throughput of EventLogging EventError events on graphite1001 is OK: OK: Less than 15.00% above the threshold [20.0] [12:52:49] 10Analytics, 10Analytics-EventLogging: EventLogging fails to validate a Recentchanges event for he.wikipedia.org - https://phabricator.wikimedia.org/T154395#2909692 (10elukey) {F5212453} [13:51:37] Hi a-team, happy new year from France :) [13:51:59] Will spend some time trying to understand what's going on with EL [14:02:48] Ok, looks like I have nothing to do, errors have happened during [14:03:06] about 1h today, and then stopped [14:03:51] Also A-Team, I'm going to launch full sqoop import in 5/10 minutes [15:17:28] joal: o/ [22:26:37] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#2910126 (10Aklapper) Didn't get to this in 2016 as GCI kept me too busy. Also, this requires more k...