[06:29:45] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Bug in time-range selector on detail page - https://phabricator.wikimedia.org/T200497 (10sahil505) [06:40:00] 10Analytics, 10Analytics-Wikistats: [Wikistats 2] Provide metric area headings in the 'Explore Topics' dropdown - https://phabricator.wikimedia.org/T200498 (10sahil505) [06:48:39] 10Analytics, 10Analytics-Wikistats: Use line charts when breaking down a column chart in Wikistats2 - https://phabricator.wikimedia.org/T189200 (10sahil505) @mforns & I were discussing this in out weekly review meeting that showing a popup just over the Change Chart selector dropdown saying switch to 'Line Cha... [08:07:52] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728 (10Pintoch) Etalab (who runs the open data portal of the French government) have released a statement (in French) concerning the attribution requirement of their "licence ouverte", conf... [09:03:13] 10Analytics, 10User-Elukey: Add a safe failover for analytics1003 - https://phabricator.wikimedia.org/T198093 (10elukey) @jcrespo any feedback for this task? I am really sorry to keep pinging for a non planned request, I'd like to get a sense if it is possible or not and think about alternative solutions. I ca... [10:46:57] * elukey lunch + errand [11:42:49] !log Restart mediawiki-history-denormalize oozie job after deploy [11:42:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:44:15] 10Analytics, 10EventBus, 10Wikimedia-Stream: EventStreams consumer backpressure for slow HTTP clients - https://phabricator.wikimedia.org/T191207 (10mobrovac) [11:44:30] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 6 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) [12:06:08] 10Analytics, 10User-Elukey: Add a safe failover for analytics1003 - https://phabricator.wikimedia.org/T198093 (10jcrespo) I think there is a confusion here, I apologize. What I am trying to say is that I don't think that for an internal analytics database service is ok to have a proxy outside of the network th... [12:48:27] (03PS1) 10Joal: Update mediawiki_history_reduced datasource name [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448504 [12:54:07] 10Analytics, 10User-Elukey: Add a safe failover for analytics1003 - https://phabricator.wikimedia.org/T198093 (10elukey) Thanks for the answer! So I agree that having proxies outside the analytics network that route data back into it is not a great idea, my question about moving the analytics1003's db away fro... [13:04:08] 10Analytics, 10User-Elukey: Add a safe failover for analytics1003 - https://phabricator.wikimedia.org/T198093 (10jcrespo) Please consider asking for those extra hw resources, you shouldn't assume those cannot be available (although of course, you shouldn't assume those are either). But they for sure won't if y... [14:22:58] milimetric: Hello :) [14:23:15] milimetric: would you agree for us to merge the AQS change in article-count ? [14:23:51] joal: yeah, it was good either way for me [14:24:10] milimetric: With elukey approcal, we could merge and deploy this change today? [14:24:44] yeah, seems safe to me [14:25:40] * joal waits eagerly for elukey answer :) [14:25:45] --verbose [14:25:48] :) [14:25:50] :) [14:26:24] elukey: I have provided a patch in the way AQS queries druid for one metric (https://gerrit.wikimedia.org/r/c/analytics/aqs/+/447624) [14:26:48] elukey: milimetric has approved that, and I'd like to deploy it before moving in holidays (meaning today) [14:27:11] elukey: obviously there is no reason to rush, just me willing to close stuff before leaving [14:27:54] Also milimetric, I updated https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2/Data_Quality, adding stats on total article count. [14:28:08] so in theory this change could wait monday, but it is true that you are going on vacation so it might be good to do it now [14:28:45] * joal sends wikithanks to elukey for his breaking-rule kindness :) [14:28:47] has it been tested carefully? I mean, any risk of issues in prod? [14:29:00] this is my only concern [14:29:03] if we are good, +1 [14:29:18] but please do it asap so we'll have a couple of hours before you log off for some time :) [14:29:27] elukey: I have not tested it as much as I can - will do that now, and if my tests makes me happy, I'll tell you and proceed [14:48:54] elukey: I confirm my tests [14:49:03] elukey: merge + deploy? [14:49:42] elukey: my tests have been running AQS locally with the new ode, querying druid through ssh tunnel - results were positive [14:50:29] sure [14:51:00] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/447624 (https://phabricator.wikimedia.org/T200272) (owner: 10Joal) [14:57:01] (03PS1) 10Fdans: Add druid snapshot deletion script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448551 (https://phabricator.wikimedia.org/T197889) [14:58:47] (03PS2) 10Fdans: Add druid snapshot deletion script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448551 (https://phabricator.wikimedia.org/T197889) [14:59:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Drop mediawiki history old snapshots from druid public cluster - https://phabricator.wikimedia.org/T197889 (10fdans) [15:01:26] standupppp [15:02:50] (03PS1) 10Joal: Update aqs to 7dba063 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/448554 [15:02:59] joal: standuppp [15:04:58] milimetric: my apologizes for having forgotten you were of today :( [15:09:15] no problem at all [15:18:03] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/448554 (owner: 10Joal) [15:18:48] !log Deploying AQS with scap [15:18:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:30:28] joal, elukey, any suggestions for where to put the EL salt file in HDFS? [15:31:05] mforns: /user/eventlogging ? [15:31:17] joal, yeeea, makes sense [15:32:11] mmmm I don't love the location [15:32:29] that user does not exist [15:32:40] can we use something that it is more straightforward ? I am afraid that we could delete it [15:32:40] elukey: I suggested that based on other patterns following the same sruture (guess who: /user/ooie [15:33:00] /user/oozie sorry [15:33:04] yes yes [15:34:05] or /user/hive for instance [15:34:20] But I'm happy with other suggestions [15:36:34] mmm /user/spark ? [15:37:24] nono I was wondering if /user was the right one since we might forget and clean it up [15:37:41] but as joal was saying we already have other things like oozie, hue, etc.. [15:37:42] we could create /etc ? [15:38:51] the other alternative that I can propose, since we are going to store sensitive stuff in there, is something like /wmf/eventlogging [15:39:16] elukey, mforns: /wmf/conf/... [15:39:23] or /wmf/etc [15:39:51] wmf/etc seems strange no? I'd like something that clearly states what we are storing in [15:40:07] like /wmf/eventlogging/sanitization or similar, not sure [15:40:18] the master of names will get back on monday mforns :) [15:40:25] :D [15:40:29] yea :] [15:40:42] ok, will wait, I'm not blocked by that [15:40:51] thankssss guys [15:44:14] fdans: I did a rapid over the patch druid deletion [15:44:21] fdans: there some issues :) [15:44:32] fdans: two main things are: [15:44:50] 1 - We want to delete full datasources, not segments in intervals [15:45:21] 2 - Full deletion is tricky (see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Delete_segments_from_deep_storage) [15:47:48] fdans: does what I say make sense? [15:49:34] joal: I’m not sure i understand why do we want to drop data sources and not intervals, since the goal is to get rid of old snapshots [15:50:09] (03PS5) 10Mforns: Add ability to salt and hash to eventlogging sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/446592 (https://phabricator.wikimedia.org/T198426) [15:50:21] fdans: snapshots are datasources, each of them contains a lot of segments (by monthly intervals) [15:52:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: [EL Sanitization] Set up salt creation and rotation - https://phabricator.wikimedia.org/T199899 (10mforns) a:03mforns [15:53:59] joal: if you have time - do you have a pre-canned spark query that you use when reading SequenceFiles to avoid serialization issues? [15:54:57] elukey: will be there in a min [15:55:23] even next month, not really urgent, I am building a newbie guide :D [15:56:07] elukey: reading sequenceFiles in spark involves RDD, not dataframes [15:56:28] And, it also involves knowing enough of what they contain [15:56:36] meaning, the types of the key and values [15:57:56] elukey: makes sense? [15:59:02] sure, but I tried to use RDD with [Int,String] or the correspondent Hadoop types to read one of the EL sequence files stored in madhu's /user dir (just as test) and I keep getting serialization issues for the Hadoop types [16:00:07] elukey: Ahhh, I get it [16:00:31] elukey: you need to use hadoop-compatible types: XXWritable [16:01:08] For instance, reading raw webrequest - SequenceFile[LongWritable, Text] (Text is the StringWritable) [16:01:40] elukey: --^ [16:02:21] elukey: https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/io/WritableComparable.html -- See All Known Implementing Classes: [16:03:17] yeah I tried but I get [16:03:18] object not serializable (class: org.apache.hadoop.io.LongWritable, value: 1442876718000) [16:03:29] that might makese sense but not to me :D [16:04:00] (just used LongWritable, Text as you suggested, then tried test.take(10).foreach(println) [16:04:21] Ah ? [16:04:23] hm [16:04:32] * elukey waits for the "Ah but you are a complete n00b!" [16:04:45] * joal summons andrews hmmmmmm power [16:05:20] all right I'll keep trying in my spare time, seems to a stupid thing that I was doing [16:05:22] elukey: can you paste your lines somewhere so that I review that? [16:22:54] (we figured it out, the missing bit was map(t => (t._1.get(), t._2.toString()))) [16:27:47] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 5 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) 05Open>03Resolved This has finally been resolved for good. Here's a summary/post-mortem for clarity and poster... [16:33:15] * elukey off!! [16:33:16] byyyeeee [16:33:35] joal: talk to you in a bit less than a month! [16:33:50] (I'll start my holidays next saturday) [16:33:58] o/ o/ o/ [17:42:40] Have a good weekend folks, I'll see you week after next :) [20:25:53] 10Analytics, 10Analytics-Kanban: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) Just wanted to verify that you are receiving data from the site. :)