[00:30:13] <tgr>	 jdlrobson: not necessarily, although it certainly would be useful
[00:30:22] <tgr>	 see what I wrote in T203814#4604373
[00:30:24] <stashbot>	 T203814: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814
[02:03:09] <nuria>	 tgr|away: +1 to deploying raven cc jdlrobson 
[02:20:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Time dimension carried on url for top metrics - https://phabricator.wikimedia.org/T206479 (10Nuria) 05Open>03Resolved
[02:20:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Wikistats: support annotations in graphs - https://phabricator.wikimedia.org/T178015 (10Nuria) 05Open>03Resolved
[02:21:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Routing code allows invalid routes - https://phabricator.wikimedia.org/T188792 (10Nuria) 05Open>03Resolved
[02:21:26] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Cleanup refinery artifact folder from old jars - https://phabricator.wikimedia.org/T206687 (10Nuria) 05Open>03Resolved
[02:21:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Growth-Team, and 2 others: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10Nuria) 05Open>03Resolved
[02:22:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Correct data-removal jobs for mediawiki tables (public and private) - https://phabricator.wikimedia.org/T198600 (10Nuria) 05Open>03Resolved
[02:23:21] <wikibugs>	 10Analytics, 10Analytics-Kanban: eventlogging_db_sanitization script failed - https://phabricator.wikimedia.org/T207165 (10Nuria) 05Open>03Resolved
[02:23:33] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Nuria)
[02:23:37] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Store the old salt for 2 extra weeks - https://phabricator.wikimedia.org/T199900 (10Nuria) 05Open>03Resolved
[02:23:57] <wikibugs>	 10Analytics, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna)
[02:24:07] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[02:29:54] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10Nuria) za.wiktionary.org , usability.wikimedia.org and aa.wikipedia.org are now selectable options...
[02:31:13] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10Nuria) 05Open>03Resolved
[02:31:57] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[02:39:47] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[02:41:26] <wikibugs>	 (03CR) 10Nuria: [C: 031] "If tests pass and we have tried to build aqs with success with these changes let's merge and deploy." [analytics/aqs] - 10https://gerrit.wikimedia.org/r/467733 (https://phabricator.wikimedia.org/T206474) (owner: 10Fdans)
[02:48:24] <wikibugs>	 10Analytics: events_sanitized could drop columns like recvfrom and sequenceId - https://phabricator.wikimedia.org/T207431 (10Nuria)
[02:48:48] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Nuria)
[02:48:52] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Retroactively sanitize (including hash and salt appInstallId fields) data in the events database - https://phabricator.wikimedia.org/T199902 (10Nuria) 05Open>03Resolved
[02:49:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Table view of timely results in wikistats 2 should be ordered in time descending - https://phabricator.wikimedia.org/T199693 (10Nuria)
[02:49:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Performance tweaks for state management in wikistats - https://phabricator.wikimedia.org/T207352 (10Nuria)
[02:52:08] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: "Anonymous Editor" is a broken link - https://phabricator.wikimedia.org/T206968 (10Nuria) To fix bug: - need to change and deploy aqs glue code that is substituting IPs by anonymous editors - need to update wikistats UI to just print a st...
[02:52:44] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Improvements to Wikistats2 chart popups - https://phabricator.wikimedia.org/T192416 (10Nuria)
[02:53:07] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[02:54:21] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics: Attempting to select all columns of mediawiki_history sometimes fails with a cryptic error message - https://phabricator.wikimedia.org/T205367 (10Nuria) 05Open>03Resolved
[03:05:08] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Active Editors  metric per project family - https://phabricator.wikimedia.org/T188265 (10Nuria) a:03JAllemandou
[03:05:27] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[03:09:57] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[03:11:23] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Active Editors  metric per project family - https://phabricator.wikimedia.org/T188265 (10Nuria) We aim to have this metric deployed on the API by the end of this quarter (December 2018)
[03:12:41] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Active Editors  metric per project family - https://phabricator.wikimedia.org/T188265 (10Nuria)
[03:12:48] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: allow to view stats for all language versions (a.k.a. Project families) - https://phabricator.wikimedia.org/T188550 (10Nuria)
[03:17:48] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics, 10Patch-For-Review: Decommission edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Nuria) The placeholders was a good idea, closing ticket.
[03:17:58] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics, 10Patch-For-Review: Decommission edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Nuria) 05Open>03Resolved
[03:21:54] <wikibugs>	 10Analytics, 10Community-Tech, 10Grant-Metrics: Review category queries - https://phabricator.wikimedia.org/T206783 (10Nuria)
[03:23:26] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.98e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[03:42:17] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.979e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[05:33:51] <elukey>	 good morning yarn
[05:34:00] <elukey>	 woah a lot of alarms :(
[05:37:15] <elukey>	 ahhh snap the threshold for the alarm is the old one
[05:37:16] <elukey>	 sigh
[05:43:35] <elukey>	 fixing it
[05:44:31] <wikibugs>	 10Analytics, 10Community-Tech, 10Grant-Metrics: Review category queries - https://phabricator.wikimedia.org/T206783 (10Marostegui) If it is not strictly necessary I would rather not create a new index on labs to avoid it drifting too much from production. So if it is possible to split the query into smaller...
[05:47:10] <elukey>	 but I have an idea about those alarms
[05:47:17] <elukey>	 for example, if I simply say 
[05:47:27] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: 1.946e+09 ge 1.946e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[05:47:33] <elukey>	 heap usage / heap max > 0.90 -> critical
[05:47:35] <elukey>	 otherwise no
[05:47:53] <elukey>	 I get an alarm that takes two metrics and doesn't need thresholds
[05:47:59] <elukey>	 like in this case (2G -> 4G)
[05:50:04] <wikibugs>	 10Analytics, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs, 10iOS-app-feature-Analytics, 10iOS-app-v6.1-Narwhal-On-A-Bumper-Car: Many errors on    "MobileWikiAppiOSSearch"  and  "MobileWikiAppiOSUserHistory" - https://phabricator.wikimedia.org/T207424 (10elukey) p:05Triage>03High
[05:50:43] <elukey>	 lovely, from 22 UTC there are 120 errors/s for --^
[05:53:13] <elukey>	 can we apply manually what Chelsy is suggesting in https://phabricator.wikimedia.org/T207424#4679134 ?
[05:59:06] <elukey>	 chelsyx: --^ (99% you are not online but worth a try :)
[06:00:53] <icinga-wm>	 RECOVERY - YARN active ResourceManager JVM Heap usage on an-master1001 is OK: (C)3.891e+09 ge (W)3.686e+09 ge 1.928e+09 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[06:51:35] <elukey>	 fixed https://grafana.wikimedia.org/dashboard/db/hadoop to host all the new metrics with "hadoop_cluster" labels
[06:51:53] <elukey>	 of course we don't have history in this graph
[06:54:32] <elukey>	 I added a banner to https://grafana.wikimedia.org/dashboard/db/analytics-hadoop
[06:57:29] <elukey>	 I took a look to the HDFS Namenode's GC metrics, they are not really super good
[06:57:38] <elukey>	 Old gen collections are very slow
[06:57:39] <elukey>	 mmmm
[07:03:18] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10ayounsi) a:03ayounsi
[07:04:58] <joal>	 Hi elukey
[07:05:09] <joal>	 good alarm morning
[07:06:40] <elukey>	 Bonjour :)
[07:06:46] <elukey>	 I am changing the jvm alarms
[07:06:58] <elukey>	 found a way to calculate the avg of the ration between used/max
[07:07:31] <elukey>	 the current way (my bad) is too error prone
[07:07:38] <elukey>	 (fixed thresholds)
[07:09:58] <joal>	 ack elukey
[07:10:29] <joal>	 elukey: EL errors are still coming from ReadingDepthSchema.enable
[07:10:50] <joal>	 I'm going to add that little command I devised yesterday to on-call doc
[07:12:23] <elukey>	 joal: there are also errors for MobileWikiAppiOSUserHistory
[07:12:38] <elukey>	 I didn't check what is the biggest one since I fixed the Yarn alarms
[07:12:39] <joal>	 elukey (from any stat machine): kafkacat -b kafka-jumbo1001 -t eventlogging_EventError -o -10000 -e | sed -n 's/^.*"schema": "\([^"]*\)"}.*$/\1/p' | sort | uniq -c
[07:12:50] <joal>	 adding that to on-call :)
[07:12:50] <elukey>	 ah yes very nice
[07:16:02] <elukey>	 ah snap I am reading the task now, the fix needs a deployment...
[07:16:19] <joal>	 elukey: for bash masters that command is not even needed to written anywhere, but I'm always fighting whenever I need to awk/sed
[07:16:44] <joal>	 Docs updated
[07:16:52] <elukey>	 thanks!
[07:21:52] <elukey>	 I'd have used | egrep -o "\"schema\": \"(\w*)\"" but it of course prints "schema: etc.."
[07:21:56] <elukey>	 interesting
[07:22:09] <elukey>	 I always forget the tricks about these tools :(
[07:22:58] <elukey>	 ah no even this one is not enough, there are multiple schema:
[07:22:59] <elukey>	 lol
[07:23:08] <joal>	 :)
[07:23:30] <joal>	 elukey: the sed one works because of event-fields order being consistent
[07:23:58] <joal>	 elukey: without field order consistency, we could match the core schema (EventError) in which we are not interested
[07:24:39] <elukey>	 yeah I know but my command does't always print both, now I am curious
[07:39:16] <elukey>	 really weird, with grep -o I cant find a way to say "print only the first match
[07:41:40] <elukey>	 /dev/mapper/eventlog1002--vg-data  870G  780G   46G  95% /srv
[07:41:44] <elukey>	 sigh
[07:41:52] <joal>	 :/
[07:42:23] <joal>	 elukey: should we copy files on HDFS to temporarilly free some space?
[07:44:00] <elukey>	 joal: in my mind there is no point in keeping more than say 7 days on eventlog1002
[07:44:07] <elukey>	 since on stat1005 we rsync for 90 days
[07:44:14] <joal>	 ah right
[07:44:16] <joal>	 hm
[07:44:45] <elukey>	 we currently keep 20d
[07:47:07] <elukey>	 for example, I just manually removed 3d
[07:47:08] <elukey>	 /dev/mapper/eventlog1002--vg-data  870G  555G  272G  68% /srv
[07:47:22] <joal>	 pff
[07:47:35] <elukey>	 ok so lemme lower down the retention to 15d now
[07:47:43] <elukey>	 just to be sure for the weekend
[07:47:44] <joal>	 +1 elukey
[07:47:54] <joal>	 need to run errand for ~1h30 - will be back
[07:48:50] <elukey>	 ack!
[08:34:37] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1060 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:08] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1049 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:08] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1051 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:08] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1044 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:08] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1071 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:09] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1056 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:12] <elukey>	 this is me
[08:35:18] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1029 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:18] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1044 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:18] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1032 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:18] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1048 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:27] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1065 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:37] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1073 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:38] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1035 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:38] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1063 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:38] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1046 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:39] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1057 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:47] <elukey>	 fixing uff...
[08:35:47] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1062 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:48] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on conf2003 is CRITICAL: bad_data: parse error at char 168: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:35:48] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1032 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:48] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1046 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:48] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1070 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:49] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1028 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:49] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1061 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:49] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1035 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:49] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1073 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:57] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on conf1006 is CRITICAL: bad_data: parse error at char 168: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:35:58] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1049 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:58] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1065 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:58] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1060 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:58] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1056 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:35:59] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1048 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:59] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1063 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:35:59] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1070 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:36:07] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1002 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:36:08] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1071 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[08:36:08] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1051 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:36:08] <icinga-wm>	 PROBLEM - HDFS DataNode JVM Heap usage on analytics1061 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:36:46] <elukey>	 nothing is happening, only a bad prometheus query
[08:37:22] <elukey>	 unclosed left parenthesis -> /me cries
[08:40:38] <icinga-wm>	 PROBLEM - HDFS active Namenode JVM Heap usage on an-master1001 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=4&fullscreen&orgId=1
[08:44:17] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1001 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[08:48:42] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1028 is OK: (C)0.95 ge (W)0.9 ge 0.3086 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:48:43] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1049 is OK: (C)0.95 ge (W)0.9 ge 0.3427 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:48:44] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1056 is OK: (C)0.95 ge (W)0.9 ge 0.4827 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:48:44] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1060 is OK: (C)0.95 ge (W)0.9 ge 0.3248 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:02] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1051 is OK: (C)0.95 ge (W)0.9 ge 0.3205 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:03] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1061 is OK: (C)0.95 ge (W)0.9 ge 0.4937 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:04] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1071 is OK: (C)0.95 ge (W)0.9 ge 0.3528 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:13] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1029 is OK: (C)0.95 ge (W)0.9 ge 0.4981 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:13] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1032 is OK: (C)0.95 ge (W)0.9 ge 0.6365 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:13] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1044 is OK: (C)0.95 ge (W)0.9 ge 0.3213 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:14] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1048 is OK: (C)0.95 ge (W)0.9 ge 0.7637 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:22] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1065 is OK: (C)0.95 ge (W)0.9 ge 0.6302 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:49:33] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1073 is OK: (C)0.95 ge (W)0.9 ge 0.5265 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:52:13] <elukey>	 a-team: all these alarms were my fault, bad prometheus query (missing parenthesis.. sigh)
[08:52:16] <elukey>	 nothing bad happened
[08:52:43] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1001 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:52:52] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on conf2002 is CRITICAL: bad_data: parse error at char 168: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:53:02] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1005 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:53:03] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on conf1005 is CRITICAL: bad_data: parse error at char 168: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:53:03] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1004 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:53:12] <icinga-wm>	 PROBLEM - YARN active ResourceManager JVM Heap usage on an-master1002 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[08:53:17] <elukey>	 sigh
[08:53:32] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1003 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:53:53] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1070 is OK: (C)0.95 ge (W)0.9 ge 0.7405 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:56:14] <icinga-wm>	 PROBLEM - Zookeeper node JVM Heap usage on druid1006 is CRITICAL: bad_data: parse error at char 170: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:58:03] <icinga-wm>	 PROBLEM - HDFS active Namenode JVM Heap usage on an-master1002 is CRITICAL: bad_data: parse error at char 246: unclosed left parenthesis https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=4&fullscreen&orgId=1
[08:58:37] <icinga-wm>	 RECOVERY - HDFS active Namenode JVM Heap usage on an-master1001 is OK: (C)0.95 ge (W)0.9 ge 0.8098 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=4&fullscreen&orgId=1
[08:58:47] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1003 is OK: (C)0.95 ge (W)0.9 ge 0.3874 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:58:48] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on conf2003 is OK: (C)0.95 ge (W)0.9 ge 0.06884 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:58:58] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on conf1006 is OK: (C)0.95 ge (W)0.9 ge 0.7651 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:58:58] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1001 is OK: (C)0.95 ge (W)0.9 ge 0.3156 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:07] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on conf2002 is OK: (C)0.95 ge (W)0.9 ge 0.1782 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:08] <icinga-wm>	 RECOVERY - HDFS active Namenode JVM Heap usage on an-master1002 is OK: (C)0.95 ge (W)0.9 ge 0.8026 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=4&fullscreen&orgId=1
[08:59:17] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1005 is OK: (C)0.95 ge (W)0.9 ge 0.2603 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:17] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1002 is OK: (C)0.95 ge (W)0.9 ge 0.4235 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:18] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on conf1005 is OK: (C)0.95 ge (W)0.9 ge 0.6223 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:19] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1004 is OK: (C)0.95 ge (W)0.9 ge 0.3248 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[08:59:28] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1035 is OK: (C)0.95 ge (W)0.9 ge 0.3369 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[08:59:35] <elukey>	 alleluia
[08:59:41] <elukey>	 ok we should be good now :D
[08:59:44] <elukey>	 sorry for the noise
[09:01:17] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1046 is OK: (C)0.95 ge (W)0.9 ge 0.3134 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[09:01:58] <icinga-wm>	 RECOVERY - YARN active ResourceManager JVM Heap usage on an-master1002 is OK: (C)0.9 ge (W)0.7 ge 0.2532 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[09:06:27] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1029 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:27] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1053 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:27] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1045 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:28] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1058 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:28] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1042 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:29] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1052 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:38] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1030 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:38] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1054 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:38] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1031 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:39] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1043 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:39] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1067 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:39] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1055 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:47] <icinga-wm>	 RECOVERY - YARN active ResourceManager JVM Heap usage on an-master1001 is OK: (C)0.9 ge (W)0.7 ge 0.4605 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=12&fullscreen&orgId=1
[09:06:47] <icinga-wm>	 RECOVERY - HDFS DataNode JVM Heap usage on analytics1063 is OK: (C)0.95 ge (W)0.9 ge 0.5751 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=1&fullscreen&orgId=1
[09:06:48] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1077 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:57] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1036 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:57] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1050 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:06:58] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1047 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:07] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1040 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:07] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1039 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:09] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1064 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:17] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1033 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:17] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1066 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:17] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1076 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:19] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1034 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:19] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1075 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:27] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1059 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:28] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1072 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:07:28] <icinga-wm>	 PROBLEM - YARN NodeManager JVM Heap usage on analytics1074 is CRITICAL: bad_data: parse error at char 121: missing unit character in duration https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:09:10] <elukey>	 this is a nightmare
[09:13:08] <icinga-wm>	 RECOVERY - Zookeeper node JVM Heap usage on druid1006 is OK: (C)0.95 ge (W)0.9 ge 0.1606 https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=40&fullscreen
[09:22:52] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1029 is OK: (C)0.95 ge (W)0.9 ge 0.5684 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:52] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1053 is OK: (C)0.95 ge (W)0.9 ge 0.7334 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:52] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1045 is OK: (C)0.95 ge (W)0.9 ge 0.5852 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:53] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1058 is OK: (C)0.95 ge (W)0.9 ge 0.7026 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:53] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1071 is OK: (C)0.95 ge (W)0.9 ge 0.627 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:54] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1042 is OK: (C)0.95 ge (W)0.9 ge 0.4416 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:54] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1052 is OK: (C)0.95 ge (W)0.9 ge 0.4639 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:55] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1051 is OK: (C)0.95 ge (W)0.9 ge 0.7832 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:22:55] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1049 is OK: (C)0.95 ge (W)0.9 ge 0.5344 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:02] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1032 is OK: (C)0.95 ge (W)0.9 ge 0.7666 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:02] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1030 is OK: (C)0.95 ge (W)0.9 ge 0.7621 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:03] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1054 is OK: (C)0.95 ge (W)0.9 ge 0.4623 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:03] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1031 is OK: (C)0.95 ge (W)0.9 ge 0.4111 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:12] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1067 is OK: (C)0.95 ge (W)0.9 ge 0.6829 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:12] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1043 is OK: (C)0.95 ge (W)0.9 ge 0.7148 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:12] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1055 is OK: (C)0.95 ge (W)0.9 ge 0.5133 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:12] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1044 is OK: (C)0.95 ge (W)0.9 ge 0.6735 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:14] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1077 is OK: (C)0.95 ge (W)0.9 ge 0.4783 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:22] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1036 is OK: (C)0.95 ge (W)0.9 ge 0.4453 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:22] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1050 is OK: (C)0.95 ge (W)0.9 ge 0.694 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:23] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1047 is OK: (C)0.95 ge (W)0.9 ge 0.6143 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:24] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1046 is OK: (C)0.95 ge (W)0.9 ge 0.5052 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:24] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1057 is OK: (C)0.95 ge (W)0.9 ge 0.538 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:32] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1040 is OK: (C)0.95 ge (W)0.9 ge 0.5339 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1035 is OK: (C)0.95 ge (W)0.9 ge 0.5824 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1039 is OK: (C)0.95 ge (W)0.9 ge 0.4456 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1060 is OK: (C)0.95 ge (W)0.9 ge 0.7672 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1073 is OK: (C)0.95 ge (W)0.9 ge 0.4606 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:34] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1061 is OK: (C)0.95 ge (W)0.9 ge 0.436 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:43] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1064 is OK: (C)0.95 ge (W)0.9 ge 0.4487 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:43] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1033 is OK: (C)0.95 ge (W)0.9 ge 0.7849 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:43] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1065 is OK: (C)0.95 ge (W)0.9 ge 0.5688 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:44] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1076 is OK: (C)0.95 ge (W)0.9 ge 0.6862 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:23:44] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1066 is OK: (C)0.95 ge (W)0.9 ge 0.7965 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:24:32] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1056 is OK: (C)0.95 ge (W)0.9 ge 0.6464 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:25:32] <joal>	 \o/! seems we are ok :)
[09:25:50] <elukey>	 sorry for the noise, not having a linter for these changes causes this damages :(
[09:26:13] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1034 is OK: (C)0.95 ge (W)0.9 ge 0.6666 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:26:14] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1059 is OK: (C)0.95 ge (W)0.9 ge 0.4821 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:26:21] <elukey>	 the idea is to have an average over the past hour of the heap usage / heap max
[09:26:22] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1075 is OK: (C)0.95 ge (W)0.9 ge 0.5402 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:26:32] <elukey>	 since it is very bumpy
[09:26:46] <elukey>	 the first version was leading to false positives
[09:26:52] <elukey>	 so I added a new improved version
[09:26:56] <elukey>	 without a ) and a m
[09:27:00] <elukey>	 sigh
[09:27:31] <joal>	 this is the problem with bots, they don't know how to do stuff in their head
[09:29:53] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1070 is OK: (C)0.95 ge (W)0.9 ge 0.6099 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:33:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1062 is OK: (C)0.95 ge (W)0.9 ge 0.5244 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:35:42] <wikibugs>	 10Analytics, 10User-Banyek: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T205544 (10Banyek) Actually I was thinking on closing the task as we have 1,4 T free space now.   Maybe before that just dropping the commonswiki_test_T177772  database with the recentchanges table which would give us a...
[09:36:49] <wikibugs>	 10Analytics, 10User-Banyek: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T205544 (10Marostegui) +1 to close. It is actually not a bad idea to leave that big DB as a safety net, so we have stuff to drop if this host complains again about disk space :-)
[09:38:22] <wikibugs>	 10Analytics, 10User-Banyek: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T205544 (10Banyek) 05Open>03Resolved Yes, that makes sense. I close the task now, we can reopen it when needed.
[09:39:12] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1074 is OK: (C)0.95 ge (W)0.9 ge 0.6247 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:42:43] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1048 is OK: (C)0.95 ge (W)0.9 ge 0.6103 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:42:43] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1063 is OK: (C)0.95 ge (W)0.9 ge 0.5742 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[09:44:33] <icinga-wm>	 RECOVERY - YARN NodeManager JVM Heap usage on analytics1072 is OK: (C)0.95 ge (W)0.9 ge 0.5455 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=17&fullscreen
[10:14:29] <elukey>	 TIL: du seems not showing . files by default
[10:16:13] <elukey>	 ok notebook1003 back in shape
[10:16:31] <elukey>	 (/srv dir not filled up anymore thanks to Diego!)
[10:20:17] <elukey>	 joal: is https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/468381/ enough for refinery source?
[10:20:27] <elukey>	 (not planning any deployment, just wondering)
[10:21:19] <joal>	 elukey: double checking
[10:23:03] * elukey waits for the -1
[10:24:55] <joal>	 actually, nope - everything fine elukey :)
[10:25:04] <elukey>	 \o/
[10:25:16] <joal>	 I wondered if the global property was used correctly, and it seems so :)
[10:29:52] <wikibugs>	 (03CR) 10Joal: [C: 031] "LGTM :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468381 (owner: 10Elukey)
[10:29:57] * elukey dances
[10:29:59] <mforns>	 heya teaammm
[10:30:02] <elukey>	 o/
[10:30:18] <mforns>	 why you dancing elukey? :]
[10:31:24] <elukey>	 mforns: I got a +1 from Joseph at first try
[10:31:28] <elukey>	 achievement unlocked
[10:31:34] <elukey>	 (for refinery source)
[10:32:29] <mforns>	 yea
[10:35:07] * joal looks for some music for elukey - https://www.youtube.com/watch?v=H_CenvaDGm0
[10:37:37] <elukey>	 joal: do you have an example of webrequest indexation for druid that I can use on the fly to test how the druid nodes are doing?
[10:37:47] <elukey>	 (labs I mean)
[10:38:23] <joal>	 elukey: batch indexation right?
[10:38:57] <elukey>	 yep yep 
[10:39:02] <elukey>	 (nice music btw :)
[10:40:04] <joal>	 elukey: I can find one yes :)
[10:40:33] <elukey>	 super thanks :)
[10:42:26] <joal>	 elukey: I assume data size will be minimal?
[10:43:23] <elukey>	 I think so yes
[10:43:36] <elukey>	 camus works, just checked, but not sure about how much data it gathers
[10:43:47] <joal>	 k
[10:44:02] <joal>	 elukey: I think it depends how many fake web-calls are made
[10:44:05] <joal>	 Let's check
[10:45:37] <elukey>	 the brokers (k4-1.analytics.eqiad.wmflabs and k4-2.analytics.eqiad.wmflabs) are up
[10:46:13] <elukey>	 ahahah it still contains all my fake topics
[10:46:13] <elukey>	 snap
[10:46:17] <elukey>	 I need to clean them up
[10:48:57] <elukey>	 I am deleting those in the meantime
[10:49:11] <joal>	 sure
[10:49:18] <joal>	 checking data presence as well
[10:51:25] <joal>	 elukey: data exists for past: /wmf/data/wmf/webrequest/webrequest_source=text/year=2018/month=5/day=18/
[10:51:28] <joal>	 for instance
[10:52:53] <elukey>	 so it is just a matter of sending events to kafka
[10:53:09] <elukey>	 or just index those
[10:53:16] <elukey>	 (the ones already there)
[10:53:56] <joal>	 For batch we can index the ones already there
[10:56:25] <joal>	 elukey: https://gist.github.com/jobar/4851717ba74b1540bce217c3505a1f9c
[10:56:33] <joal>	 elukey: not tested, but shouldn't be far from ok
[10:57:08] <elukey>	 <3
[11:09:53] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 5 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10mobrovac) >>! In T207329#4679089, @Pchelolo wrote: > The above patch should mitigate the problem, however, we need to also ac...
[11:22:25] <wikibugs>	 (03PS1) 10Mforns: Fix bug in EventLoggingToDruid, add time measures as dimensions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342)
[11:22:47] <wikibugs>	 (03CR) 10Fdans: [V: 031] "Nuria: all the dependencies that were updated affect only the testing part of the project, and tests run fine. Also this fixes the build o" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/467733 (https://phabricator.wikimedia.org/T206474) (owner: 10Fdans)
[11:25:37] <wikibugs>	 (03CR) 10Mforns: "I tested this with real data (navigationtiming) and it works (adds new time measure dimensions)." (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[11:49:36] <wikibugs>	 (03PS2) 10Joal: Add WebrequestSubsetPartitioner spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020)
[11:49:57] <wikibugs>	 (03CR) 10Joal: [V: 031] "Tested on cluster" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[11:50:46] <joal>	 looking at EL error throughput makes me so sad :(
[11:51:30] <wikibugs>	 (03CR) 10Elukey: [C: 032] Upgrade camus-wmf dependency to camus-wmf9 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468381 (owner: 10Elukey)
[11:52:12] <elukey>	 I am trying to deploy turnilo in labs, and the labs deployment server is broken..
[12:24:02] <icinga-wm>	 PROBLEM - Throughput of EventLogging EventError events on einsteinium is CRITICAL: 410.7 ge 30 https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1
[12:27:25] <joal>	 :(
[12:29:26] <elukey>	 407???
[12:29:41] <elukey>	 the main issue is that we need to wait for a deployment..
[12:29:42] <elukey>	 sigh
[12:29:46] <joal>	 elukey: it keeps rising
[12:30:25] <joal>	 elukey: Im assuming there have been a deploy yesterday, right?
[12:31:06] <elukey>	 joal: I am not sure if the thing is part of the mediawiki train or not
[12:31:34] <joal>	 elukey: it has started yesterday night - Must have through either a deploy or a config change that can be reverted, no?
[12:32:38] <elukey>	 joal: I was under the impression that https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/468486/ was the fix
[12:33:56] <joal>	 seems so elukey - Must have been a deploy yesterday
[12:35:19] <joal>	 elukey: we could devise a patch for EL to actually not send that error to error schema for the time being, but it's a hack :(
[12:35:49] <elukey>	 there was https://tools.wmflabs.org/sal/production?d=2018-10-18 (last deploy at ~21:49 UTC)
[12:37:08] <joal>	 It correspond to the alert-time elukey - Thanks for the triple check
[12:47:09] <elukey>	 ok so I may have a fix for the deployment-server in labs but I need to wait for reviews
[12:47:13] <elukey>	 will try to index now
[12:57:11] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 032] Removing error messages from whitelist for schema UploadWizardExceptionFlowEvent [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467440 (https://phabricator.wikimedia.org/T136851) (owner: 10Nuria)
[12:57:15] <joal>	 elukey: let me know if I can help
[13:01:14] <elukey>	 it seems succeeding (the datasource was 'webrequest' not 'test_webrequest')
[13:01:26] <elukey>	 but I get this from the middle manager
[13:01:27] <elukey>	 2018-10-19T12:59:42,026 INFO org.apache.hadoop.mapreduce.Job: Running job: job_1536235072238_10961
[13:01:31] <elukey>	 2018-10-19T12:59:51,244 INFO org.apache.hadoop.mapreduce.Job: Job job_1536235072238_10961 running in uber mode : false
[13:01:34] <elukey>	 2018-10-19T12:59:51,246 INFO org.apache.hadoop.mapreduce.Job:  map 0% reduce 0%
[13:01:37] <elukey>	 2018-10-19T13:00:00,703 INFO org.apache.hadoop.mapreduce.Job: Task Id : attempt_1536235072238_10961_m_000000_0, Status : FAILED
[13:01:40] <elukey>	 2018-10-19T13:00:13,825 INFO org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
[13:01:43] <elukey>	 2018-10-19T13:00:22,890 INFO org.apache.hadoop.mapreduce.Job:  map 100% reduce 100%
[13:01:46] <elukey>	 2018-10-19T13:00:22,901 INFO org.apache.hadoop.mapreduce.Job: Job job_1536235072238_10961 completed successfully
[13:01:49] <elukey>	 2018-10-19T13:00:23,035 INFO org.apache.hadoop.mapreduce.Job: Counters: 54
[13:04:22] <elukey>	 so one mapper failed with Error: NULL_VALUE
[13:04:58] <elukey>	 2018-10-19T13:00:00,395 ERROR [main] org.apache.hadoop.mapred.YarnChild - Error running child : java.lang.NoSuchFieldError: NULL_VALUE at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:245)
[13:12:01] <joal>	 elukey: the hadoop job failed but indexation succeeded? I'm surprised :)
[13:21:33] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 5 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10Ottomata) Verified microseconds is fine with python jsonschema.  I also checked Camus, which uses `[[ http://joda-time.source...
[13:23:54] <wikibugs>	 (03PS4) 10Joal: Add oozie job partitioning webrequest subset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020)
[13:26:21] <wikibugs>	 (03CR) 10Ottomata: [C: 031] Fix bug in EventLoggingToDruid, add time measures as dimensions (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[13:28:57] <wikibugs>	 (03CR) 10Ottomata: Add oozie job partitioning webrequest subset (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[13:30:55] <wikibugs>	 10Analytics, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs, 10iOS-app-feature-Analytics, 10iOS-app-v6.1-Narwhal-On-A-Bumper-Car: Many errors on    "MobileWikiAppiOSSearch"  and  "MobileWikiAppiOSUserHistory" - https://phabricator.wikimedia.org/T207424 (10NHarateh_WMF) @chelsyx this should be fixed when htt...
[13:32:58] <wikibugs>	 (03PS3) 10Joal: Add WebrequestSubsetPartitioner spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020)
[13:42:08] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 5 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10kostajh) @Pchelolo @Ottomata and @mobrovac thank you for tracking this down and working on it. Should our team plan to verify...
[13:47:46] <wikibugs>	 10Analytics, 10Operations, 10Traffic: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10mforns) a:05elukey>03mforns
[14:04:10] <elukey>	 joal: here I am, yes it is kinda weird, trying to figure out why.. could it be related to weird data on hdfs?
[14:43:44] <wikibugs>	 (03CR) 10Mforns: Fix bug in EventLoggingToDruid, add time measures as dimensions (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[14:46:21] <wikibugs>	 10Analytics, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10AndyRussG) @Nuria, @JAllemandou thanks so much taking the time to check this out, much appreciated!!! We can di...
[14:56:43] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10Cmjohnson) @elukey Finally, they agreed to replace the mother board.  This should happen Monday or Tues next week.
[14:59:54] <wikibugs>	 10Analytics, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10Nuria) @AndyRussG   Can you answer these questions: * are all browsers receiving banners? * are only js-enabled...
[15:03:56] <milimetric>	 I'm getting an error running "mvn test" that makes me feel I have some bad versions of something, but I tried mvn clean and it doesn't work
[15:03:57] <milimetric>	 java.lang.NoClassDefFoundError: com/holdenkarau/spark/testing/SharedSparkContext
[15:04:15] <elukey>	 this is definitely not me then :)
[15:04:18] <milimetric>	 there was some other more brutal way to clean that joal told me at some point... can't remember, but this time I'll put it in the readme
[15:09:54] <elukey>	 joal: found the problem, I had messed up in the deb package with parquet libs -.-
[15:10:04] <elukey>	 removed the rouge ones on the host, index fine :)
[15:10:11] <elukey>	 I am re-building the package no
[15:10:14] <elukey>	 *now
[15:16:03] <nuria>	 milimetric: you can rm  -rf ~/.mvn local cache
[15:17:28] <milimetric>	 doing that now, but needing to do it means some versioning is messed up in our poms
[15:17:30] <wikibugs>	 (03CR) 10Nuria: Memoizing results of state functions (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)
[15:19:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10mforns)
[15:22:24] <wikibugs>	 (03CR) 10Fdans: Set the active filter correctly on breakdowns mount (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[15:23:05] <wikibugs>	 (03PS4) 10Fdans: Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[15:23:17] <wikibugs>	 (03CR) 10Milimetric: Memoizing results of state functions (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)
[15:24:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[15:24:55] <fdans>	 what now
[15:26:04] <nuria>	 milimetric: also make sure you have 1.8 as java vs
[15:26:26] <nuria>	 milimetric: that  is what sets your local cache versions
[15:26:52] <nuria>	 milimetric: issue could be with java vs  , not poms per se
[15:27:00] <milimetric>	 yeah, javac 1.8.0_181 but I still get the same error after rm -rf ~/.m2
[15:27:10] <elukey>	 ottomata: re: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/467648/ - we have 90d on stat1005 of logs no? And now 15d on eventlog1002, plus the camus importing data.. I thought it was fine, am I missing something?
[15:27:17] <elukey>	 (plus the srv partition was filled up again :(
[15:27:19] <wikibugs>	 (03PS5) 10Fdans: Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[15:28:26] <nuria>	 milimetric: trying to repro your issue
[15:28:26] <milimetric>	 people hate on npm, but you pick up a node project 4 years later and it builds.  Python's ok for like 1 year, Java's ok for like 1 month, and Ruby's ok for like 2 days.
[15:35:33] <milimetric>	 weird, I ran mvn test, it downloaded a bunch of stuff, failed.  Now I'm running mvn verify, it's downloading more stuff
[15:36:10] <milimetric>	 got a better error at least:
[15:36:11] <milimetric>	 Could not resolve dependencies for project org.wikimedia.analytics.refinery.spark:refinery-spark:jar:0.0.79-SNAPSHOT: Could not find artifact org.wikimedia.analytics.refinery.core:refinery-core:jar:0.0.79-SNAPSHOT in wmf-mirrored (https://archiva.wikimedia.org/repository/mirrored/)
[15:37:52] <milimetric>	 mvn compile downloads even more...
[15:38:52] <wikibugs>	 (03CR) 10Mforns: "I lean towards Dan's idea," [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)
[15:39:33] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Contributors-Analysis, 10Product-Analytics: Hive join fails when using a HiveServer2 client - https://phabricator.wikimedia.org/T206279 (10fdans) Info added in wikitech for future reference!
[15:39:43] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Contributors-Analysis, 10Product-Analytics: Hive join fails when using a HiveServer2 client - https://phabricator.wikimedia.org/T206279 (10fdans) a:05fdans>03None
[15:52:28] <wikibugs>	 (03CR) 10Mforns: "This table is only partitioned by snapshot." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888) (owner: 10Fdans)
[15:53:21] <wikibugs>	 10Analytics, 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10Milimetric)
[15:53:22] <wikibugs>	 (03CR) 10Fdans: [V: 031] "Mforns Nuria: yep already tested with dry run" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888) (owner: 10Fdans)
[15:53:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965 (10Milimetric)
[15:53:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 Remaining reports. - https://phabricator.wikimedia.org/T186121 (10Milimetric)
[15:54:00] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Beta - https://phabricator.wikimedia.org/T186120 (10Milimetric)
[15:54:12] <wikibugs>	 10Analytics, 10Analytics-Kanban: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric)
[15:55:23] <wikibugs>	 (03CR) 10Fdans: [C: 031] Add mediawiki-history-wikitext oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/463548 (https://phabricator.wikimedia.org/T202490) (owner: 10Joal)
[15:55:56] <wikibugs>	 (03CR) 10Mforns: [C: 032] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888) (owner: 10Fdans)
[15:56:12] <fdans>	 thank youuu mforns 
[15:56:23] <mforns>	 npppp :]
[15:56:42] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), 10Patch-For-Review: Improve Dashiki extension messaging - https://phabricator.wikimedia.org/T205644 (10Milimetric) The first task that I self-merged is now deployed: https://meta.wikimedia.org/wiki/Config:Dashiki:Anno...
[16:00:08] <nuria>	 milimetric, ottomata , fdans : standddupppp
[16:00:27] <fdans>	 nuria: someone's on a rush!
[16:00:38] <nuria>	 fdans: someone has no watch!
[16:03:42] <ottomata>	 i sent an e scrum!
[16:03:50] <ottomata>	 nuria:  
[16:03:52] <ottomata>	 ^^
[16:04:56] <wikibugs>	 (03CR) 10Nuria: [V: 032 C: 032] Removing error messages from whitelist for schema UploadWizardExceptionFlowEvent [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467440 (https://phabricator.wikimedia.org/T136851) (owner: 10Nuria)
[16:05:07] <wikibugs>	 10Analytics, 10Analytics-Kanban: Update datasets to have explicit timestamp for druid indexation facilitation - https://phabricator.wikimedia.org/T205617 (10Milimetric) a:03fdans
[16:05:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Update to cloudera 5.15 - https://phabricator.wikimedia.org/T204759 (10Milimetric) a:03elukey
[16:06:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Milimetric) a:03Ottomata
[16:08:14] <bearloga>	 ottomata: there's a special script thing that sets up beeline for users on the stat machines, right? a system user that can use hive wouldn't automatically be able to use beeline, correct?
[16:11:28] <wikibugs>	 (03CR) 10Fdans: [V: 032] Add mediawiki_history_reduced to list of tables to drop snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888) (owner: 10Fdans)
[16:12:02] <joal>	 bearloga: hi - You can find the default options set on stat machine via 'DEFAULT_OPTIONS = {'-n': os.environ['USER'],'
[16:12:05] <joal>	                    '-u': 'jdbc:hive2://an-coord1001.eqiad.wmnet:' +
[16:12:05] <joal>	 woo
[16:12:08] <joal>	                          '10000',
[16:12:10] <joal>	                    '--outputformat': 'tsv2', }
[16:12:24] <joal>	 I meant 'cat /usr/local/bin/beeline' bearloga, on a stat machine
[16:12:28] <joal>	 sorry for the spam :)
[16:13:57] <wikibugs>	 (03PS2) 10Fdans: Add mediawiki_history_reduced to list of tables to drop snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888)
[16:14:04] <wikibugs>	 (03CR) 10Fdans: [V: 032] Add mediawiki_history_reduced to list of tables to drop snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/468311 (https://phabricator.wikimedia.org/T197888) (owner: 10Fdans)
[16:14:45] <bearloga>	 joal: I was wondering because we tried to switch some queries that are run via reportupdater (by analytics-search system user) from hive to beeline and chelsyx had issues, so my best guess is that the system user isn't by default setup to use beeline, just hive
[16:15:06] <wikibugs>	 10Analytics-Kanban: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric)
[16:15:12] <wikibugs>	 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10Milimetric)
[16:15:16] <wikibugs>	 10Analytics-Kanban: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric)
[16:15:20] <wikibugs>	 10Analytics-Kanban: Enable automatic ingestion from eventlogging into druid for some schemas - https://phabricator.wikimedia.org/T190855 (10Milimetric)
[16:15:21] <joal>	 correct- acutally bearloga, we suggest not to use beeline
[16:15:42] <wikibugs>	 10Analytics-Kanban: Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment - https://phabricator.wikimedia.org/T204953 (10Milimetric)
[16:15:44] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Milimetric)
[16:15:54] <joal>	 With the version of hive we have, it is still not as good as hive bare client
[16:15:59] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Milimetric)
[16:16:08] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Event Schema Registry - https://phabricator.wikimedia.org/T201063 (10Milimetric)
[16:16:18] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201068 (10Milimetric)
[16:16:27] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10Milimetric)
[16:16:29] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965 (10Milimetric)
[16:16:34] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 Remaining reports. - https://phabricator.wikimedia.org/T186121 (10Milimetric)
[16:16:38] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Beta - https://phabricator.wikimedia.org/T186120 (10Milimetric)
[16:16:42] <wikibugs>	 10Analytics-Kanban: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric)
[16:17:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10Milimetric)
[16:17:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Reading-analysis: Final Vetting of Family Wide unique devices data - https://phabricator.wikimedia.org/T169550 (10Milimetric)
[16:17:24] <bearloga>	 joal: what if we have to use beeline for certain queries because otherwise there are a bunch of messages that hive outputs that aren't caught by -S or current grep filters
[16:17:27] <wikibugs>	 10Analytics, 10Analytics-Kanban: Quantify volume of traffic on piwik with DNT header set - https://phabricator.wikimedia.org/T199928 (10Milimetric)
[16:17:39] <wikibugs>	 10Analytics, 10Analytics-Kanban: [EL sanitization] Write and productionize script to drop partitions older than 90 days in events database - https://phabricator.wikimedia.org/T199836 (10Milimetric)
[16:17:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10Milimetric)
[16:17:58] <joal>	 bearloga: I have heard of that yes, but you might run into other issues with beeline :(
[16:18:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Use spark to split webrequest on tags - https://phabricator.wikimedia.org/T164020 (10Milimetric)
[16:19:14] <bearloga>	 joal: we'd use it on a per-query basis after checking that beeline runs it fine. I mean, if you're suggesting not to use beeline ever at all why even have it around?
[16:19:58] <joal>	 bearloga: we never removed it - We actually confirgured it to work and wanted to follow the advise of moving out of it
[16:20:37] <joal>	 bearloga: however we ran into more and more errors as people started using it - particularly due to memory-errors on local-join tasks (small memory for hive-server)
[16:21:06] <joal>	 bearloga: and no real decision has been made on removing beeline
[16:21:14] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging's processors stopped working - https://phabricator.wikimedia.org/T200630 (10Milimetric) a:03Milimetric
[16:21:28] <wikibugs>	 10Analytics, 10Analytics-Kanban: Set a timeout for regex parsing in the Eventlogging processors - https://phabricator.wikimedia.org/T200760 (10Milimetric) a:03Milimetric
[16:23:01] <bearloga>	 joal: got it. in that case, is there any way to have hive be less annoying with its output? running every query through a dozen greps to filter out unnecessary output seems…sub-optimal
[16:23:22] <joal>	 bearloga: indeed!!!
[16:23:39] <joal>	 bearloga: can ou send me the query? I think the logs are related to parquet...
[16:25:44] <wikibugs>	 10Analytics, 10Analytics-Kanban: Quantify volume of traffic on piwik with DNT header set - https://phabricator.wikimedia.org/T199928 (10Milimetric) p:05Triage>03High
[16:25:55] <wikibugs>	 10Analytics, 10Analytics-Kanban: [EL sanitization] Write and productionize script to drop partitions older than 90 days in events database - https://phabricator.wikimedia.org/T199836 (10Milimetric) p:05Triage>03High
[16:26:12] * elukey off!
[16:26:28] <milimetric>	 wait nuria you said you'd babysit me :)
[16:27:29] <wikibugs>	 (03CR) 10Milimetric: Set the active filter correctly on breakdowns mount (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[16:27:41] <wikibugs>	 (03CR) 10Milimetric: "/me just trying to ruin Fran's Friday :)" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[16:28:27] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Prototype in node intake service - https://phabricator.wikimedia.org/T206815 (10Ottomata) Proceeding!  https://github.com/ottomata/eventbus  Need to move to gerrit.
[16:29:14] <joal>	 Yay! oozie job partitioning webrequests !
[16:30:19] <wikibugs>	 10Analytics, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-kafka-jumbo-1 - https://phabricator.wikimedia.org/T207489 (10Krenair) a:03Krenair Found this in the prefix config for deployment-kafka-jumbo: `profile::kafka::broker::monitoring::replica_maxlag_warning: '1000'`, changed it to remove the...
[16:31:45] <wikibugs>	 10Analytics, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-kafka-jumbo-1 - https://phabricator.wikimedia.org/T207489 (10Ottomata) Thanks!
[16:32:57] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10Pchelolo) @kostajh If you have time for that it would be perfect. I admit, I don't have any idea how to test this.  Thank you...
[16:33:03] <wikibugs>	 (03PS6) 10Joal: Update DataFrameToHive for dynamic partitions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465202 (https://phabricator.wikimedia.org/T164020)
[16:33:06] <nuria>	 milimetric: yes!
[16:33:06] <wikibugs>	 (03PS7) 10Joal: Add webrequest_subset_tags transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020)
[16:33:08] <wikibugs>	 (03PS4) 10Joal: Add WebrequestSubsetPartitioner spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020)
[16:33:08] <nuria>	 bc?
[16:33:14] <wikibugs>	 10Analytics, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-kafka-jumbo-1 - https://phabricator.wikimedia.org/T207489 (10Krenair) 05Open>03Resolved puppet runs again
[16:33:27] <nuria>	 milimetric: batcave?
[16:33:36] <milimetric>	 yep, I'm there
[16:34:12] <wikibugs>	 (03PS5) 10Joal: Add oozie job partitioning webrequest subset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020)
[16:35:06] <wikibugs>	 10Analytics, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-kafka-jumbo-1 - https://phabricator.wikimedia.org/T207489 (10Krenair) (I think -2 was also affected but seems fine now)
[16:35:48] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10kostajh) It's easy enough for me to see if running "clear watchlist" on my enwiki account works :)  @Etonkovidova may want to...
[16:36:33] <wikibugs>	 (03CR) 10Joal: [V: 031] "Tested on cluster" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[16:38:07] <bearloga>	 joal: I pinged chelsyx to send you the query she was getting all the extra output from that made her want to switch to beeline
[16:38:42] <joal>	 ok bearloga - depending on hour and knowing I'll be off Monday, it might only be Tuesday that I look at it :)
[16:39:14] <bearloga>	 no problem joal :)
[16:44:04] <wikibugs>	 (03PS6) 10Fdans: Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[16:49:01] <milimetric>	 ottomata: do you know where/why intellij doesn't find the org.wikimedia.analytics.schemas package?
[16:49:40] <joal>	 milimetric: just to be sure - those are the ones in the refinery-camus project, right?
[16:49:42] <milimetric>	 in the refinery.camus.coders?
[16:49:45] <milimetric>	 yea
[16:50:08] <joal>	 milimetric: It's because the code needs to be generated manually first
[16:50:42] <joal>	 milimetric: IIRC correctly the easiest is to build through maven CLI  and look in the target folder for the generated classes
[16:50:52] <milimetric>	 uh... is there a better way to do that?
[16:51:16] <milimetric>	 so that it's automatic?
[16:51:33] <joal>	 milimetric: not that I know
[16:51:47] <joal>	 milimetric: maven does it, but I don't intellij knows how to
[16:53:53] <joal>	 milimetric: the java files can be found on stat1004:/home/joal/generated.tgz
[16:55:33] <milimetric>	 the java files show up in IntelliJ, under src/generated
[16:55:54] <milimetric>	 but it looks like you have to configure IntelliJ to recognize those as a "sources" folder?
[16:56:00] <milimetric>	 (doing that now, will let you know)
[16:59:54] <ottomata>	 milimetric:  that osunds right...
[17:00:41] <milimetric>	 it worked ok to fix the build in IntelliJ, now it's giving me errors when I run all tests because it can't find some files like access_method_test_data.csv
[17:00:57] <milimetric>	 there are also a lot of warnings, I'm going to spend some time and clean this up and update the README
[17:03:44] <nuria>	 milimetric: check that in your system ./refinery-core/target/test-classes/access_method_test_data.csv has read permits for all
[17:04:29] <milimetric>	 hm, good thought, but yeah it's got r for all
[17:05:54] <wikibugs>	 10Analytics, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs, 10iOS-app-feature-Analytics, 10iOS-app-v6.1-Narwhal-On-A-Bumper-Car: Many errors on    "MobileWikiAppiOSSearch"  and  "MobileWikiAppiOSUserHistory" - https://phabricator.wikimedia.org/T207424 (10chelsyx) Thank you @NHarateh_WMF !
[17:09:15] <wikibugs>	 (03CR) 10Ottomata: [C: 031] Fix bug in EventLoggingToDruid, add time measures as dimensions (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[17:15:38] <mforns>	 a-team, I'm not feeling well this evening, got a string cold, will stop for today...
[17:16:34] <mforns>	 byeee
[17:19:30] <joal>	 no cold for me but stop nonetheless :) Have a good weekend team
[17:19:30] <milimetric>	 joal: does your intellij have the same problem with these csv files?  I changed one test to /home/milimetric/projects/refinery-source/refinery-core/src/test/resources/pageview_test_data.csv instead of src/test/resources/pageview_test_data.csv and it passed
[17:19:39] <milimetric>	 oh nvm, good night joal 
[17:19:46] <joal>	 I have time milimetric :)
[17:20:27] <milimetric>	 tests pass on command line but not in intellij, all because of these path issues, but that doesn't sound like a fun Friday night, joal, you should go
[17:20:29] <joal>	 milimetric: are the test/resources folders recognized as resources in your projects?
[17:20:37] <milimetric>	 I'll double check
[17:21:31] <milimetric>	 they were recognized as Test Resources, changed to Resources to see
[17:21:40] <joal>	 Test resources is same for me
[17:22:00] <milimetric>	 yeah, java.io.FileNotFoundException: src/test/resources/pageview_test_data.csv (No such file or directory)
[17:22:12] <joal>	 milimetric: in Paths tab, do you have "Use module compile output path" ?
[17:22:34] <milimetric>	 joal: you know, nvm, I'm gonna blow out my .idea settings completely and start clean and document the steps I need to make it work in README
[17:22:58] <joal>	 milimetric: this will for sure be helpful!!
[17:23:12] <milimetric>	 ok, will do
[17:23:39] <milimetric>	 have a nice weekend man
[17:23:39] <joal>	 Gone for now then :)
[17:23:46] <joal>	 ThYou too
[17:23:53] <joal>	 Thanks, you too ...
[18:20:40] <wikibugs>	 (03CR) 10Nuria: [C: 032] Fix bug in EventLoggingToDruid, add time measures as dimensions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[18:20:52] <wikibugs>	 (03CR) 10Nuria: [V: 032 C: 032] Fix bug in EventLoggingToDruid, add time measures as dimensions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[18:26:43] <wikibugs>	 (03Merged) 10jenkins-bot: Fix bug in EventLoggingToDruid, add time measures as dimensions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468550 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns)
[18:37:57] <wikibugs>	 (03CR) 10Nuria: [V: 031 C: 032] Upgrade packages and commit package-lock to remove vulnerabilities [analytics/aqs] - 10https://gerrit.wikimedia.org/r/467733 (https://phabricator.wikimedia.org/T206474) (owner: 10Fdans)
[18:43:06] <wikibugs>	 10Analytics, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10Nuria) @AndyRussG I would look at EL data and see if any browser is notably missing from the events you have se...
[18:56:27] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Prototype in node intake service - https://phabricator.wikimedia.org/T206815 (10Ottomata) Heya @Pchelolo.  I'm feeling good about the general layout and architecture for this prototype.  Would love to go over it with you and/or have...
[19:31:52] <wikibugs>	 (03PS4) 10Nuria: Memoizing results of state functions [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352)
[19:32:11] <wikibugs>	 (03CR) 10Milimetric: [C: 032] Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[19:33:00] <wikibugs>	 (03CR) 10Nuria: "Please see 3 independent caches given 3 usages. Let me know if this is what you were thinking." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)
[19:47:22] <wikibugs>	 (03CR) 10Milimetric: [C: 04-1] Memoizing results of state functions (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)
[21:35:17] <wikibugs>	 (03PS1) 10Milimetric: [WIP] working on understanding and testing page history and quality [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468678
[21:37:08] <milimetric>	 mmmmk, I feel like I'm starting to understand page history reconstruction again.  So now I'm going to go away for the weekend and forget it all.  Have a nice weekend everyone!!  :)
[21:55:36] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Contributors-Analysis, 10Product-Analytics: Hive join fails when using a HiveServer2 client - https://phabricator.wikimedia.org/T206279 (10Neil_P._Quinn_WMF) @fdans thank you! Is it worth pursuing @joal's suggestion? ("it could interesting to try to raise HiveServer2 ava...
[23:19:31] <wikibugs>	 (03PS5) 10Nuria: Memoizing results of state functions [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352)
[23:22:16] <wikibugs>	 (03PS6) 10Nuria: Memoizing results of state functions [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352)
[23:22:41] <wikibugs>	 (03CR) 10Nuria: Memoizing results of state functions (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria)