[01:07:51] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3911730 (10IKhitron) > Yes. {icon frown-o} [07:23:55] wow joal Druid with java-7 works with hadoop java-8 then?? [07:24:22] if so we are basically ready to write down the upgrade procedure!!! \o/ [07:56:10] joal, in the meantime I am trying to understand what you did for the banner impression spark+druid work [07:56:18] https://github.com/druid-io/tranquility/blob/master/docs/spark.md looks nice [07:57:46] so at a very high level, we poll kafka periodically and filter data for banner impression. That data is in RDDs that are sent to Druid using tranquillity beams [07:58:07] is it vaguely close to reality? [07:59:17] for netflow data I'd need to create a new BannerImpressionsStream.scala like object, that gets executed periodically via yarn [07:59:45] better, it keeps being executed by yarn since it is a spark streaming job [08:01:01] in the end we'll have netflow data in druid [08:08:46] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912027 (10elukey) p:05Triage>03Normal [08:13:18] elukey: Good morning :) [08:13:26] elukey: You have understood everything :) [08:15:45] joal: morning! I feel a tiny bit less ignorant then :) [08:16:17] elukey: I can give you more details if you wish, but you have the core :) [08:16:46] elukey: code is in refinery-job-spark-2.1 package [08:20:06] joal: sure! How do you test these things though? spark-shell? [08:20:17] or launching them via yarn [08:20:18] ?? [08:20:33] elukey: I use spark shell to test my logic yes [08:20:47] elukey: But I don't send to druid via spark-shell [08:21:01] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912044 (10elukey) One thing that I noticed is that when a burst of warning happens the following is registered around the same time in syslog: ``... [08:21:51] elukey: also I'm super surprised about druid-j7 and hadoop j8, but hey, that works :0 [08:21:54] :) [08:22:00] \o/ \o/ [08:22:28] elukey: shall we spend time reviewing and modifying the upgrade procedure? [08:22:58] yep, I'd ask you for say 30 mins to 1) get caffeinated properly 2) write down a draft that we can discuss [08:23:00] elukey: We could ask Andrew to proof-read it later if so [08:23:16] +1 for more coffee - Was awake late yesterday :) [08:23:24] super :) [09:00:54] ok joal created https://etherpad.wikimedia.org/p/analytics-hadoop-java8 [09:02:49] elukey: ok - How do we proceed? [09:04:54] we can review it in bc if you want [09:05:01] Sounds good :) [09:25:54] 10Analytics, 10TCB-Team, 10Documentation: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3912129 (10Addshore) >>! In T185111#3909240, @Legoktm wrote: > Just use `README` or `docs/metrics.txt` or wherever you want? :) docs/metrics.txt was sort of a... [09:42:48] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3912159 (10elukey) Draft of the upgrade plan in https://etherpad.wikimedia.org/p/analytics-hadoop-java8 [11:08:34] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3912334 (10elukey) For Spark I believe that update-java-alternatives is enough to force it to pick up java8. I found this in one of the scripts called... [11:08:36] joal: --^ [11:10:18] so spark-env.sh for spark2 seems to be installed by the package, meanwhile spark-env.sh is deployed via puppet (cdh module) [11:10:55] the way in which JAVA_HOME is retrieved seems sane enough for spark/spark2, I'd just update-java-alternatives for them [11:11:02] but let me know if you think otherwise [11:11:17] (now I also found out why java8 in labs is picked up by default for spark) [11:17:46] helloooooooo [11:18:02] o/ [11:19:38] works for me elukey :) [11:19:54] <3 [11:42:07] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912427 (10mforns) > root@eventlog1001:/srv/log/upstart# grep eventlogging/processor /var/log/syslog.1 | grep -c terminated > 276 There are 12 pro... [11:44:45] mforns: I am ignorant about the relationship betweens processors and consumers, but how can a processor disconnect event cause issues in the mysql consumer? [11:44:56] (I mean causing the duplicate warning) [11:45:06] elukey, hmmm good question... [11:45:37] maybe the processor is sending duplicates to kafka for the consumer to read [11:45:46] *processors are [11:47:04] so, when the processor restarts, maybe there are some kafka offsets that are not updated [11:47:27] and some events are reprocessed, and sent again to valid-mixed? [11:49:52] where is the uuid generated in EL? [11:50:02] I remember that Andrew showed it to me but can't find it [11:50:32] https://github.com/wikimedia/eventlogging/blob/dabceb173f79d9036b576c8d9249c3a3768e3be3/eventlogging/compat.py#L101 right? [11:51:02] mmm probably not [12:02:53] but as far as I can tell the processors should be the ones generating the uuid in the capsule right? [12:05:15] https://github.com/wikimedia/eventlogging/blob/master/eventlogging/parse.py#L71 [12:06:24] so the uuid is recvFrom + vk_seq_id + vk_dt [12:06:56] the processor does capsule_uuid [12:07:02] err event['uuid'] = capsule_uuid(event) [12:07:21] so yeah, repeated events are probably the root cause [12:07:47] but in this case the consumer should read, for some reason, the same event two times [12:10:03] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912489 (10elukey) So the processor does `event['uuid'] = capsule_uuid(event)` that is defined like this: ``` def capsule_uuid(capsule): """Ge... [12:11:18] mforns: if this is true, the uuid of the m4 consumer warnings would need to be present duplicates in valid-mixed [12:11:30] sorry elukey, reading [12:12:16] elukey, yes, that makes sense [12:13:17] valid-mix or schema? [12:13:36] hmmm... not sure, but I'd say valid-mixed [12:13:52] checking [12:15:21] elukey, yes, the mysql consumer consumer from eventlogging-valid-mixed [12:15:59] see: /etc/eventlogging.d/consumers/mysql-m4-master-00 [12:16:20] *consumes [12:16:39] ahh okok [12:16:55] well if there is a duplicate we'll get it also in the per schema one [12:16:58] for sure [12:17:10] aha, yes, should be [12:17:33] so I am trying something like [12:17:34] kafkacat -C -b kafka1023.eqiad.wmnet -t eventlogging_Print -p0 -o beginning | grep --color 9c635730293d562d9c6b483355007919 [12:18:14] yesss duplicates! [12:18:16] bingo! [12:18:19] elukey, from wich box do you execute this? [12:18:23] :D [12:18:32] stat1004 [12:18:58] cool [12:19:26] now, we only need to discover why consumers are dying... [12:19:57] you think there's memory leaks? [12:20:57] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912499 (10elukey) Just tested the use case in the description on stat1004 with: ``` kafkacat -C -b kafka1023.eqiad.wmnet -t eventlogging_Print -p... [12:21:10] 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912500 (10elukey) [12:21:35] 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912027 (10elukey) a:03elukey [12:22:16] mforns: mmm it maybe is something related to the configuration of the consumer (that should be using kafka-python iirc). [12:22:24] maybe a timeout or something similar? [12:22:59] ah.. would it make sense to expire every 60 minutes? [12:29:41] yep yep, resource consumption seems super fine [12:32:20] maybe consumer_timeout_ms? [12:32:33] going to lunch and then I'll check, let me know if you find anything! [12:32:41] thanks a lot for the brain bounce mforns [12:32:42] ok [12:32:44] <3 [12:33:00] buon appetito [12:33:19] ahahah grazie mille! [13:12:10] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3912575 (10faidon) Thanks for working on this task, very much appreciated! My idea was for "tag" to be used for our differe... [13:46:06] (03PS6) 10Joal: [WIP] Add spark code for wikidata dumps processing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 [13:51:11] (03PS1) 10Joal: Clean refinery-job from BannerImpressionStream job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/405285 [13:51:53] (03CR) 10Ottomata: [C: 031] Clean refinery-job from BannerImpressionStream job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/405285 (owner: 10Joal) [13:58:14] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3912641 (10Ottomata) Wow nice etherpad plan, <3 [14:00:12] 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912643 (10Ottomata) Hm, this sounds right to me. The question now is why is the processor process restarting? [14:08:17] 10Analytics-Tech-community-metrics: "Authors With No Activity During Last 6 Months" on C_Gerrit_Demo lists some users who have been active in the last 6 months - https://phabricator.wikimedia.org/T185316#3912657 (10Aklapper) p:05Triage>03Normal [14:09:35] 10Analytics-Tech-community-metrics: "Authors With No Activity During Last 6 Months" on C_Gerrit_Demo lists some users who have been active in the last 6 months - https://phabricator.wikimedia.org/T185316#3912657 (10Aklapper) [14:42:49] hey I'm around, but getting ready for my trip [14:42:57] ping me if you need me, and I'll be in the meetings [14:43:22] deployed wikistats last night and finished our outline for the Research, Analytics, Machine Learning session: https://phabricator.wikimedia.org/T183320 [14:43:36] do comment on that if you're interested [14:49:08] ack!! [14:51:17] Gone to school - See you at standup a-team [14:59:08] milimetric: is it ok to sacrifice some geo resolution in the name of performance? [14:59:27] instead of geometries with a 50m resolution, using a topojson of 110m [15:04:39] elukey, can you help me with eventlogging beta scap deploy? it errors for me [15:06:12] mforns: sure! [15:06:17] what does it say? [15:06:22] one sec [15:06:37] got logged out of ssh [15:07:53] elukey, it says: Host key verification failed [15:09:05] milimetric: yt? [15:09:59] mforns: can I try to deploy? [15:10:12] elukey, sure [15:10:22] elukey, you wat to bc? [15:12:57] mforns: in deployment-tin the branch was "otto" and not "master", was it intended? [15:13:16] whaaa [15:13:17] no sorry [15:13:21] elukey, when I first came, it was otto, then I switched to master, and tried to deploy with no lucj [15:13:36] if i left an otto branch [15:13:38] so I went back to otto, to see if there was sth that I had overseen [15:13:40] the past me apologizes [15:13:56] xD, no prob, the error was on master [15:13:58] ahhhahah no worries! I asked because didn't know if you guys were doing an experiment [15:14:09] but it tries to deploy to eventlog1001! [15:14:18] is there any beta environment set ? [15:14:33] yes, the command I run was: scap deploy -e beta [15:14:42] perfect [15:15:04] connection to deployment-eventlog02.deployment-prep.eqiad.wmflabs failed and future stages will not be attempted for this target right ? [15:15:23] yesss [15:15:39] before that one it says: Host key verification failed [15:15:45] yeah in beta that thing is weird [15:15:55] in prod we use the keyholder to access to the ssh key [15:16:08] but I remember that in beta is all messed up (or used to be) [15:16:19] lemme grab a task with some notes from previous me :D [15:16:39] :] [15:21:40] hey ottomata [15:21:42] what's up [15:22:26] fdans: I am totally in favor of that, if it was up to me, the world map would be on a voronoi diagram with the tension having some significance :) [15:23:00] * fdans destroys Liechtenstein [15:25:57] mforns: I thought I had it fixed but then I have now a different error [15:26:00] lovel [15:26:01] lovely [15:26:05] oh [15:29:09] hey milimetric just thinking about this json refine purge callback thing, but i think i have something to try... [15:29:36] its tricky because the code is meant to take a input json path and then write it to a hive table [15:29:46] so, i have to insert in there somewhere a way to do extra writing [15:30:04] but i think i have something... [15:30:50] is it Scala? [15:31:03] sorry I haven’t seen the code ever [15:33:08] mforns: finally fixed! [15:33:31] you should be able to deploy now, I just done it [15:33:38] elukey, thanks a loooot [15:36:06] joal: [15:36:24] off to the eye doctor a-team [15:36:29] i have an Option[(DataFrame, HivePartition) => DataFrame] [15:36:33] i want to match it [15:36:34] byeee fdans see you in SF! [15:36:41] how do I match a anon function? [15:36:55] fn match { [15:36:56] None => ... [15:36:56] Some(???) => ... [15:36:56] } [15:38:29] elukey, it vvvorked, thanks [15:38:58] super [15:41:32] ottomata, maybe Some(f):Option[(DataFrame, HivePartition) => DataFrame] ?? [15:41:55] Some(f)! hmm will try that [15:44:14] 10Analytics, 10Code-Stewardship-Reviews, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3912816 (10faidon) [15:47:01] 10Analytics, 10Code-Stewardship-Reviews, 10Operations, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3912831 (10faidon) [15:53:13] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3912839 (10elukey) Andrew and Joseph completed a test in labs to verify that Druid running on Java 7 would still work fine with Hadoop running java 8,... [15:59:43] mforns: can I ask you a question about https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/druid/daily/generate_daily_druid_pageviews.hql ? [15:59:57] sure! let me read [16:00:08] I am just curious about loading data to druid [16:00:12] aha [16:00:23] but I can't get why in the end there is a drop table [16:00:41] I mean, isn't the tmp table needed since it has just been created? :D [16:00:51] I am surely missing a trivial thing since I am super ignorant [16:01:07] no no, I think the table is an external one, [16:01:31] so once the data is inserted, hive puts it in the expected format under the given LOCATION [16:01:47] when the table gets dropped, the data is still there to be consumed by druid [16:02:02] but only the metadata is dropped by hive [16:02:04] ahhhhhhhhh [16:02:22] and I think oozie should delete the data once druid has loaded it. [16:02:39] makes sense, thanks! [16:02:50] np :] [16:23:11] (03CR) 10Mforns: [V: 032 C: 032] "Ok, self-merging after 2 +1's" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/404662 (https://phabricator.wikimedia.org/T185100) (owner: 10Mforns) [16:32:18] scala is so annoyyying [16:32:19] [ERROR] /srv/home/otto/refinery-source/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/JsonRefine.scala:38: error: Implementation restriction: case classes cannot have more than 22 parameters. [16:32:27] what if i want to have more than 22 cli options?! [16:32:34] hahahahha [16:34:38] ah scala 2.11 doesn't have this limt for case classes [16:34:39] phew [16:38:44] ottomata: Heya - have you found your function matching problem? [16:39:37] quick question - for the netflow data topic, how do we import data in hdfs/hive? Camus ? [16:39:51] joal getting a little stuck on something else, but will prob come back to it [16:40:01] elukey: i don't think we do yet, do we? [16:40:04] but we could, and ya, camus [16:40:10] what' the format in kafka? [16:40:11] json? [16:41:07] ottomata: not yet, I am just investigating to figure out what to do :) [16:41:49] the format should be something like [16:42:07] {"tag": 0, "as_dst": 0, "as_path": "", "peer_as_dst": 0, "stamp_inserted": "2018-01-18 18:43:00", "stamp_updated": "2018-01-18 18:45:02", "packets": 56805000, "bytes": 43872019000} [16:42:15] currently not working properly, but the idea is this one [16:43:45] so we'd need to set up camus to import, then a hive table on top (?) and finally hourly/daily druid indexing? [16:44:26] yup! [16:44:55] if you don't want to deal with the hive step...you could sparkstreaming/tranquility it into druid [16:45:04] running home forstadnup, bbiab [16:46:22] ah yes, but it would be nice to have all in hive as well [16:50:24] elukey, ottomata[m]: Something else about having camus/hive in addition to stream is ensuring lambda-architecture style - streaming issues are overwrittne by batch [16:51:45] joal: one thing that I'd like to make sure is that they need streaming, since I *think* that they would be fine even with hourly stats [16:52:03] elukey: fine by me then :) [16:52:31] I mean, I am looking forward to write the streaming job but not looking forward to support it if not needed :D [16:52:49] makes sense elukey :) [16:53:37] joal: the other interesting thing is that netflow data can have (potentially) different formats if I got it correctly [16:53:52] hm? [16:54:27] in this case we are going to collect BGP AS paths basically, so how many Autonomous Systems are traversed for outbound traffic for example [16:54:47] Arzhel explained it very well in "Traffic does: Our AS <-> One of our ISP/peering AS <-> some middleman AS <-> some more middleman AS <-> destination AS" [16:55:05] but there might be other info to collect about the network [16:55:26] that might have a different json format (need to verify it) because completely separate domains [16:55:49] so my idea would be to check if pmacct can be configured with different topics for each type of data [16:56:01] rather than sending everything and do the filtering on our side [16:57:30] not sure, need to verify, asking in the task :) [16:58:32] yep from the pmacct config it seems possible to separate traffic by kafka topics [16:58:35] good [17:00:14] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove EL capsule from meta and add it to codebase - https://phabricator.wikimedia.org/T179836#3913139 (10mforns) We still need to: - Merge and deploy the gerrit change - Mark https://meta.wikimedia.org/wiki/Schema:EventCapsule as deprecated -... [17:00:24] ping ottomata elukey mforns [17:01:04] 10Analytics-Kanban, 10Pageviews-API, 10Patch-For-Review: Use country ISO codes instead of country names in top by country pageviews - https://phabricator.wikimedia.org/T184911#3913144 (10Nuria) [17:03:51] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3913146 (10mforns) [17:03:56] 10Analytics-Kanban, 10Patch-For-Review: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3913147 (10JAllemandou) [17:04:05] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove EL capsule from meta and add it to codebase - https://phabricator.wikimedia.org/T179836#3913148 (10mforns) [17:04:28] 10Analytics-Cluster, 10Analytics-Kanban: Filter local IPs before checking for geo info - https://phabricator.wikimedia.org/T160822#3913153 (10JAllemandou) [17:04:30] 10Analytics-Kanban, 10Patch-For-Review: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3349143 (10JAllemandou) [17:08:12] 10Analytics-Kanban, 10Analytics-Wikistats: Add clear definitions to all metrics, along with links to Research: pages - https://phabricator.wikimedia.org/T183261#3913159 (10Milimetric) [17:08:22] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Link to 'more info' doesn't always work - https://phabricator.wikimedia.org/T183188#3913160 (10Milimetric) [17:08:35] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wrong y-axis labels on wikistats graph - https://phabricator.wikimedia.org/T184138#3913161 (10Milimetric) [17:13:04] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3913172 (10Nuria) [17:48:53] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3913277 (10elukey) Thanks for the explanation! I have another question: is netflow data ending up in the `neflow` kafka to... [17:53:52] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3777449 (10Ottomata) HMMM. If this is JSON data, and the schema is consistent, we could use JSONRefine to build the table,... [17:56:27] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3913296 (10elukey) >>! In T181036#3913291, @Ottomata wrote: > HMMM. If this is JSON data, and the schema is consistent, we... [18:00:52] * elukey off! [18:03:39] Bye elukey - have a good weekend :) [18:09:17] hey joal et al, congrats on the clickstream release. It’s making the rounds of the interwebs. [18:10:20] I got a query from Moritz Stefaner – one of my dataviz heroes – he asked me if there are any plans to include frwiki in the list of projects [18:12:09] Hi DarTar, Thank YOU for the writing ! TRhis actually is the reason for which I stopped working in research :) [18:12:31] DarTar: French could easily be added - Just need a reason :) [18:12:36] lol [18:13:40] joal: from Moritz: “I could really use something like this for the French wikipedia (am doing something about france / paris for an upcoming exhibition and am still fishing for ideas…)” [18:14:37] it’s a one-off request so IDK that’s enough as a reason, but Moritz’s work tends to get a ton of traction and result in many people getting interested in the data and methods he uses [18:15:12] happy to put you guys in touch if it’s useful [18:34:18] ok joal another scala q for you [18:34:54] about class loading [18:35:01] i want to do somethin glike [18:35:19] --transformFunction 'org.wikimedia.analytics.refinery.job.deduplicateEventLogging' [18:35:20] or whatever [18:35:26] i think i can figure out how to do this in java [18:35:31] but i think i'd prefer in scala like [18:35:50] object deduplicateEventLogging { def apply(...) { ... } } [18:35:51] then [18:36:19] val transformFn = scala.load.object.whatever("org.wikimedia.analytics.refinery.job.deduplicateEventLogging") [18:36:25] and then i could call transformFn() [18:36:33] and have org.wikimedia.analytics.refinery.job.deduplicateEventLogging.apply be called [18:36:41] haven't yet figured out scala.load.object.whatever [18:36:59] i'm trying something like https://stackoverflow.com/a/17016515/555565 [18:37:03] but haven't yet got it working [18:37:17] getting scala.reflect.internal.MissingRequirementError: object org.wikimedia.analytics.refinery.core.SparkJsonToHive.testFn in JavaMirror [18:43:38] WAIT< maybe it does work [18:43:40] hmmm [18:54:16] 10Analytics, 10Analytics-EventLogging: Archive and drop the MobileOptionsTracking EventLogging MySQL table - https://phabricator.wikimedia.org/T185339#3913393 (10phuedx) [19:27:57] ottomata: Sorry was gone [19:28:25] ottomata: I've never done reflection in scala but I'll have a look :) [19:30:18] 10Analytics, 10Code-Stewardship-Reviews, 10Operations, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3913473 (10bd808) [19:31:51] 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3913475 (10ayounsi) Empty logs was due to BGP being disabled between pmacct and the router. It's now back: ``` {"tag": 0, "... [19:34:23] joal i got something [19:34:24] but its weird [19:34:47] baseically this https://stackoverflow.com/a/32699571/555565 [19:37:54] 10Analytics, 10Analytics-Wikistats: Wikistats 2: New Pages split by editor type wrongly claims no anonymous users create pages - https://phabricator.wikimedia.org/T185342#3913500 (10Jdforrester-WMF) p:05Triage>03Low [19:41:35] DOHHH not quite [19:41:35] hmmmm [19:41:38] hmmm [20:10:01] (03PS2) 10Joal: Add optional datasource to druid loading workflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405053 [20:15:34] 10Analytics-Kanban: Add french wiki to clickstream generation - https://phabricator.wikimedia.org/T185344#3913593 (10JAllemandou) [20:15:35] wahhhhhh [20:15:35] java.lang.ClassCastException: scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror cannot be cast to scala.Function2 [20:16:16] 10Analytics-Kanban: Add wikis to clickstream generation - https://phabricator.wikimedia.org/T185344#3913604 (10JAllemandou) a:03JAllemandou [20:16:22] 10Analytics-Tech-community-metrics: One account (in "gerrit_top_developers" widget) counted as two accounts (in "gerrit_main_numbers" widget) - https://phabricator.wikimedia.org/T184741#3913608 (10Aklapper) Another example: https://wikimedia.biterg.io/goto/5c35a039b8e89180d3b5cfd2311d4de4 at the top says "58 new... [20:16:38] (03PS1) 10Joal: Add five more wikipedias clickstreams [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405363 (https://phabricator.wikimedia.org/T185344) [20:17:09] DarTar: --^ [20:17:54] w00t [20:18:42] that’s freaking amazing joal [20:19:00] (03CR) 10Ottomata: [C: 031] ":D" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405363 (https://phabricator.wikimedia.org/T185344) (owner: 10Joal) [20:19:03] you may want to respond to Moritz directly with the good news :) [20:19:37] DarTar: I hope we'll have that computed next month - depends on deploy/restart schedule correlated with all-hands :) [20:19:38] so will the first dump for these additional languages be available at the beginning of February (for January) or in March for February? [20:20:10] got it, so tentatively the former may happen, right? [20:20:25] correct DarTar - Hopefully so (no deploy next week :) [20:20:34] that’s awesome [20:20:40] Thanks for fast review ottomata :) [20:20:55] do you want to let Moritz know or shall I do it? I just sent you a line of intro [20:21:28] ottomata: other idea for your thing: hardcoded string as CLI parameter, and a Map[String, T => U] to get your function [20:21:35] ottomata: not as sexy though [20:21:49] ? [20:21:57] DarTar: I'll let him know - I have not seen your message [20:22:36] joal: perfect [20:22:39] ottomata: trying to get it clearer through better understanding on my side [20:22:55] You want to apply a function based on a CLI parameter, right? [20:22:59] ya want to do like [20:23:10] --transformFn 'org.bla.bla.function' [20:23:20] and expct that org.bla.bla.function is in classpath [20:24:05] the only way i know in scala to define a static 'function' is to put it in an object [20:24:20] so I was just going to make it be some object apply() method [20:24:23] and call the 'object' the function [20:24:32] i think i've got something, but it is super weird [20:24:43] i got [20:24:58] scala> f1 [20:24:58] res4: reflect.runtime.universe.MethodMirror = method mirror for org.wikimedia.analytics.refinery.core.tFn.apply [20:25:02] and i can call f1 [20:25:04] which is cool! [20:25:07] but its super funky [20:25:22] i was trying to do asInstanceOf[(...) => x] [20:25:28] to get it as the expected function type [20:25:36] but i can't go from MethodMirror [20:28:34] ottomata: I think what I dislike with approach is type-erasure [20:28:53] ? [20:28:59] ottomata: In scala types are static - therefore removed from compiled bytecode [20:29:35] ottomata: Which means that the function you instanciate "at runtime" has not been checked for type [20:30:01] I know, I'm paranoid and functional-integrist :D [20:30:53] Oh by the way DarTar - Have you seen my message about my previous answer being kicked out from research list? [20:31:19] aye [20:31:22] i can't say I like my function joal [20:31:25] but this has to be possible, no? [20:31:43] ottomata: It surely is, minus type-checking ;) [20:31:49] oh yes, joal I did. I actually delegated ownership of the list to Leila and Diego but forgot you wrote to me directly [20:32:00] I don’t understand what happened, let me check again [20:32:03] Arf DarTar - Didn't know that [20:32:03] how else would you pass things like principal.builder.class=CustomizedPrincipalBuilderClass [20:32:04] to kafka [20:32:07] unless i guess its always java? [20:32:37] ottomata: scala will do it since it;s java as well [20:33:37] ottomata: And actually, it's the same issue for java - In fact, I think I don't like reflection :D [20:34:43] ok ottomata - In scala cleanest way I can think of would be to create a trait defining the function signature you wish, have the class appling it using that trait definition, and then enforce passing an objet that extends that trait [20:39:20] joal: fixed, check your inbox (I also forwarded your reply to the list) [20:39:41] Many thanks Dario [20:39:44] hm, i could be ok with that, you need to make an object that extends my trait with an apply function? [20:39:45] DarTar: --^ [20:39:46] np [20:39:50] and then you have some way of making this ok? [20:39:56] and finding the object by string name? [20:40:38] ottomata: if you use a trait, whatever method name would do [20:40:55] ottomata: I'm googling around to see if I manage to do it [20:40:57] aye, but apply would be nice, since it is kinda standard [20:41:00] i guess. [20:41:02] ok [21:00:36] joal: scala is so annoayyying [21:00:40] got a min for bc? [21:02:45] ottomata: sure [21:03:32] k in bc joal [22:20:09] 10Analytics-Data-Quality: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3913824 (10Tbayer) Here is an example to illustrate item 3., for [[https://commons.wikimedia.org/wiki/File:Those_Love_Pangs_1914_CHARLIE_CHAPLIN_CHESTER_CONKLIN_Mack_Sennett... [22:28:07] 10Analytics-Data-Quality: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3913845 (10Nuria) Can we have an example that is not a media type? it is likely that for media downloaded in chunks the field doesn't reflect file size, it should however re... [22:38:12] 10Analytics-Data-Quality, 10Operations, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3913867 (10Nuria) [22:56:38] 10Analytics-Data-Quality, 10Operations, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3913932 (10Tbayer) And here is a detailed list of the queries from the first and third client in T185350#3913824 (169 and 174 of them, respectiv... [23:42:04] 10Analytics: Estimate how long a new Dashiki Layout for Qualtrics Survey data would take - https://phabricator.wikimedia.org/T184627#3914003 (10mcruzWMF) Yes! {F12668913} @Nuria @Milimetric @egalvezwmf uploaded mock ups here! [23:43:42] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3914005 (10Nuria) {F12668914} Repro: 1) reload 2) click on menu 3) type enwiki You cannot see what you are typing (browser is busy) and once you do it takes about 7 secs to render the serarh list [23:43:48] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3914006 (10Nuria) [23:48:43] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3914009 (10Nuria) After looking at this for a while the only obvious issue that i can find has to do with the amount of nodes we are adding to the DOM at once. We have available 500 options each of which has two s... [23:49:28] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3914010 (10Nuria) {F12668927} [23:52:15] 10Analytics: Estimate how long a new Dashiki Layout for Qualtrics Survey data would take - https://phabricator.wikimedia.org/T184627#3914024 (10Nuria) Ok, thank you. Those are quite specific for this survey rather than being a generic take on how to display survey data. On our end we really to think about survey... [23:54:07] 10Analytics-Kanban: Wikiselector Perf issues on Chrome - https://phabricator.wikimedia.org/T185334#3914032 (10Nuria) Adding a lookup for subtitle, that is " {{r{subtitle]}}" brings rendering time up to 7 secs.