[01:46:19] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 46.67% of data above the critical threshold [30.0] [01:50:37] just to check, is it a HR violation if I invite all the SAS programmers in for interviews solely so I can hurl holy water at them? [01:55:58] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 100.00% of data above the critical threshold [30.0] [05:36:16] Analytics, WMF-Product-Strategy: Provide metrics for WMF quarterly report on January-March 2015 - https://phabricator.wikimedia.org/T97344#1287106 (Tbayer) Obtained all the data (except the save latency from T97378) and calculated quarterly averages and trends: - [[https://docs.google.com/spreadsheets/d/... [07:01:57] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 100.00% of data above the critical threshold [30.0] [07:03:37] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [07:10:08] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 46.67% of data above the critical threshold [30.0] [07:13:28] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [08:37:46] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 40.00% of data above the critical threshold [30.0] [09:07:37] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [09:30:47] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 40.00% of data above the critical threshold [30.0] [13:10:58] milimetric: you have a minute for me ? [13:11:17] oops: Moooooorning all :) [13:11:37] mogge [13:22:22] Nikerabbit, sorry? [13:22:31] hey joal|night :) [13:23:09] Ironholds: mogge, huomenta, morning, what's your tea of coffee? [13:26:07] Nikerabbit, "nicotine" [13:26:45] Nikerabbit: I'd go for yoga :-P [13:29:42] is there ETA for the event logging batching fix? [13:30:15] mforns handles it, I can't really say [13:40:47] um, hello folks who have done this wikimania registration thing [13:40:51] what nights do I need to check for hotel? [13:43:37] milimetric: you know? [13:43:50] do I just need the wed - sun? [13:44:11] baaa, i will wait til after standup :) [13:48:38] Ironholds: yt? q about your fluorine change [13:49:27] ottomata, yep! [13:49:31] did I not post the patch? [13:49:41] or is it just a "can we not use a distinct directory"? [13:54:45] you did, but there is more to it than that [13:54:52] but first [13:54:58] there is no /srv/aggregate-datasets on fluorine [13:55:11] oh god [13:55:13] * Ironholds headdesk [13:55:22] it's /a/aggregate-datasets and I forgot to symlink, eh? [13:55:32] ahhh [13:55:33] nono [13:55:34] ok [13:55:47] no, that is just inconsistency on the part of older servers [13:55:47] puppet gives me a headache but nobody else on t'team knew what to do either [13:55:51] ohh, gotcha [13:55:52] haha, well i am back! [13:55:55] yay! [13:56:00] I missed the ottomata! [13:56:04] for at leats a half day today, then back back next week :) [13:56:05] so jaa [13:56:14] um, also, there is no rsync daemon on fluorine [13:56:18] so you can't do the ::srv ... thing] [13:56:41] Ironholds: i'm confused too though [13:56:46] what are /a/public-datasets on fluorine anyway? [13:56:54] there's no... [13:56:57] that data does exist [13:56:58] * Ironholds reaches for the whisky [13:56:58] sorry [13:57:00] that directory [13:57:01] exists [13:57:04] but there's nothing in it [13:57:04] yeah [13:57:07] not yet! [13:57:14] I wanted the rsync before I started cronjobbing things [13:57:15] how's it going to get there? [13:57:29] a widdle python script on fluorine will run once a day and write to it? [13:57:38] ok, from what source? [13:57:42] /a/mw-log ? [13:57:47] stuff in there? [13:57:57] parse those files, then generate public-datasets data from it? [13:58:57] the search logs, actually [13:58:58] but yes [14:00:28] shouldn't we just copy those logs over to stat1002 or stat1003 and do the analysis there? [14:00:36] like we do for other things? [14:01:18] also, what search logs? [14:01:21] do you know the log file? [14:01:27] oh, CirrusSearch.log? [14:01:30] CirrusSearchRequests.log? [14:01:54] man, we should really put these into kafka [14:02:43] ottomata, please, dear god, not kafka [14:02:50] oh, wait. kafka. [14:02:51] kafka is good [14:02:54] ha [14:02:54] just not HDFS :d [14:03:01] ja kafka > hdfs, hdfs is secondary [14:03:04] kafka makes everything easier [14:03:08] but yes, I would totally be down to copy them across to stat1002 [14:03:13] that would be a helluvalot easier [14:03:19] so, its about 2.5K / second on the Requests.log [14:03:33] but (1) they're like, 36GB a day and (2) it's more of a pita [14:03:45] like, it was my first suggestion, but engineers in search thought it was a pain, basically [14:03:50] if you want to do it I am down :D [14:04:27] i would like to do it, but we shouldn't make it a blocker [14:04:32] i'd have to look into php clients [14:04:37] which is somehting i want to do for eventloggina nyway [14:04:40] that'll be fun! :) [14:04:41] but, not yet. [14:04:42] ok [14:04:51] ja we can copy this log file over to stat1002 [14:04:54] it is rotated daily? [14:05:07] oh pssh [14:05:14] compressed they are around 3G / day [14:05:17] maybe more [14:05:20] 3-5G [14:05:26] ja, let's do that [14:08:13] ottomata, okay! [14:08:17] perfect :D [14:08:32] want me to note that and abandon the patch? [14:09:06] ja, i'm on it now [14:11:39] danke schoen! [14:15:06] Ironholds: did they only recently start collecting these? [14:15:15] the RequestLogs? [14:15:18] I think so? It's from the new system [14:15:32] aye, it only goes back a few weeks. cool. [14:15:38] man, why didn't they put this in Kafka?! [14:15:43] where was I when they were taling about thins?!?! [14:15:48] this* [14:15:59] this is like +1 more thing to turn off [14:16:01] this is using udp2log [14:17:38] it is? ew [14:17:57] pop into -search when people wake up and poke nik and chad, I guess? if you wanna switch it over fully [14:20:43] oh, ha, didn't know that was a room [14:20:45] so many rooms! [14:22:39] Ironholds: https://gerrit.wikimedia.org/r/#/c/211123/ [14:24:49] clever! [14:25:28] (Abandoned) OliverKeyes: (WIP) project class/variant extraction UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/188588 (owner: OliverKeyes) [14:32:01] ottomata: I did Tuesday night too 'cause they said they'd cover 6 nights [14:32:05] So Tue - Sun [14:32:14] checking out Monday morning [14:33:20] Nikerabbit: soon-ish, but we don't like deploying on Fridays [14:33:32] early next week [14:33:52] joal|night: yes, I've got minutes, but my IRC refuses to ping me [14:34:03] No prob :) [14:34:13] About article tiltes [14:34:20] tell me when we can batcave :) [14:35:03] ok, I'll be there in 2 minutes [14:35:12] milimetric: thx, can you also add that to the task for FYI for everyone ;) [14:56:35] ottomata, how do i delete data from hive? i already droped the table, but the data is still there, e.g. /mnt/hdfs/user/hive/warehouse/yurik.db/515_03_2014_12_02 [14:57:04] hm, i would think if it was not an external table dropping it would delete it [14:57:11] can you just remove it from hdfs then? [14:57:18] hdfs dfs -rm -r ... [14:57:35] ottomata, i don't want to accidentally delete something else :) [14:57:57] what's the exact command for the above dir - /mnt/hdfs/user/hive/warehouse/yurik.db/515_03_2014_12_02 [14:59:14] hdfs dfs -ls -d /user/hive/warehouse/yurik.db/515_03_2014_12_02 [14:59:16] oops [14:59:22] hdfs dfs -rm -r /user/hive/warehouse/yurik.db/515_03_2014_12_02 [14:59:25] yurik: ^ [14:59:41] * yurik doesn't like those oopses... [15:00:48] ha, that oops was because i pasted you the -ls command [15:03:13] yeah, i see ;) thanks! [15:04:36] Analytics-EventLogging, operations: Allow eventlogging ZeroMQ traffic to be consumed inside of the Analytics VLAN. - https://phabricator.wikimedia.org/T99246#1288016 (Ottomata) NEW [15:05:58] milimetric: I did some code reading and looking at the recent changes made in EL. Where do I start on this? I saw your puppet patch had some comments by Joseph. Is that still WIP? [15:09:01] Ironholds: waiting on this now: https://phabricator.wikimedia.org/T99245 [15:09:27] yaay [15:20:05] Analytics, MediaWiki-General-or-Unknown, Services, Wikidata, and 4 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1288059 (GWicke) [15:20:45] Analytics, MediaWiki-General-or-Unknown, Services, Wikidata, and 4 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#933968 (GWicke) [15:21:43] Analytics, MediaWiki-General-or-Unknown, Services, Wikidata, and 4 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1288062 (mobrovac) [15:33:14] Analytics-EventLogging, Analytics-Kanban: ContentTranslationError event logging table is not receiving new events - https://phabricator.wikimedia.org/T98842#1288078 (Milimetric) Just FYI: we're trying to deploy the new patch that makes this problem better early next week. [15:43:15] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Patch-For-Review: Story: user wants to be able to re-run a failed report more easily {dove} [13 pts] - https://phabricator.wikimedia.org/T88610#1288116 (mforns) Open>Resolved [15:43:42] Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Compact Wikimetrics' old report files {dove} - https://phabricator.wikimedia.org/T95756#1288125 (mforns) Open>Resolved [16:00:27] Analytics, Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Add support for X-WMF-UUID method of transmitting app install ID to apps uniques reports [5 pts] {hawk} - https://phabricator.wikimedia.org/T96926#1288192 (madhuvishy) Open>Resolved This change is done and deployed. [16:01:27] Analytics-Engineering, Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: "Validate Again" functionality is broken - https://phabricator.wikimedia.org/T78339#1288198 (madhuvishy) Open>Resolved This change is done and was deployed. [16:07:30] Analytics-Cluster, Analytics-Kanban, Easy: Build component for Oozie jobs to sends e-mails - https://phabricator.wikimedia.org/T88433#1288212 (madhuvishy) Current progress on this task is here - https://gerrit.wikimedia.org/r/#/c/210632/. Since Oozie doesn't let you attach files in it's email-action, a... [16:50:08] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [17:48:53] milimetric: hey [17:49:06] hi madhuvishy [17:49:29] wanna chat now? [17:49:34] Analytics-EventLogging, operations: Allow eventlogging ZeroMQ traffic to be consumed inside of the Analytics VLAN. - https://phabricator.wikimedia.org/T99246#1288492 (akosiaris) Open>Resolved a:akosiaris done. I just punched holes for the mentioned ports. Principle of least priviledge and all. Ch... [17:50:33] madhuvishy: yeah, to the batcave! [18:03:08] mforns_brb: if you're still around, we were wondering if you knew where the mobile session scala code was [18:13:22] Analytics, Analytics-Kanban: Onboard Madhu - https://phabricator.wikimedia.org/T92985#1288544 (madhuvishy) Open>Resolved All of these are done :D [18:31:23] Analytics-Cluster, Fundraising Sprint Kraftwerk, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1288620 (AndyRussG) [18:32:31] milimetric, sorry was out for a bit, yes: https://gerrit.wikimedia.org/r/#/c/199935/ [18:33:01] thx mforns, madhu is thinking about grabbing it, but she'll probably do the mobile app sessions [18:33:47] milimetric, isn't this the mobile app sessions code? [18:34:07] oops, got them confused [18:34:22] she's looking at the app sessions vs. stacked bar chart prototype you did [18:34:36] milimetric, oh, ok! [18:35:20] I'm telling you, she's just going to steal all our work, we'll have to all become product owners to get her more stuff at this rate [18:35:21] :) [18:38:21] Analytics-Cluster, Fundraising Sprint Kraftwerk, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1288632 (AndyRussG) a:AndyRussG [18:43:40] milimetric, xD yeah [18:47:21] milimetric: mforns lol [21:10:41] milimetric: i think i'm gonna start with the app session metrics task. it has scala and feels more shiny. [21:11:11] cool :) [21:11:38] then I can't help much, but I'm glad to listen if you wanna talk problems out [21:12:02] I'm about to sign off today, but I'll stay on IRC [21:12:18] milimetric: yeah no problem, i doubt i'll get much done today too [21:39:10] Analytics, WMF-Product-Strategy: Provide metrics for WMF quarterly report on January-March 2015 - https://phabricator.wikimedia.org/T97344#1289193 (Tbayer) For the number of new (Wikipedia) articles, I continued to use the approach from last quarter, relying on Emausbot (see also T97476): [[https://meta.... [21:43:18] Analytics, WMF-Product-Strategy: Provide metrics for WMF quarterly report on January-March 2015 - https://phabricator.wikimedia.org/T97344#1289201 (Tbayer) PS: the last calculation on the number of new articles might be affected by [[https://meta.wikimedia.org/wiki/Article_counts_revisited/2015-03-29_cha... [22:04:33] Analytics, Research-and-Data: Referrer data for en:Glitter for shareafact test - https://phabricator.wikimedia.org/T93270#1289286 (leila) a:leila>None [22:35:18] what the hell is a java.lang.reflect.InvocationTargetException when it's at home?