[08:05:11] <joal>	 Morning team!
[08:16:29] <wikibugs>	 10Analytics-Kanban, 10EventBus, 10Pywikibot-core: EventStreams doesnt find any messages anymore - https://phabricator.wikimedia.org/T184713#3896185 (10Xqt) I have requests 2.7.0 for my production environment; I get no response from sseclient because `next(self.resp_iterator)` is empty there. For the developm...
[08:36:29] <wikibugs>	 (03PS1) 10Joal: Update pageviews top and by-country response def [analytics/aqs] - 10https://gerrit.wikimedia.org/r/403890 (https://phabricator.wikimedia.org/T184541)
[08:36:46] <wikibugs>	 10Analytics-Kanban, 10EventBus, 10Pywikibot-core: EventStreams doesnt find any messages anymore - https://phabricator.wikimedia.org/T184713#3896200 (10Xqt) >>! In T184713#3893484, @Ottomata wrote: > Ah, I did deploy EventStreams yesterday for T171011.  I don't know exactly what caused this change, but I thin...
[08:37:02] <wikibugs>	 10Analytics-Kanban, 10RESTBase-API, 10Patch-For-Review, 10Services (watching): Update AQS pageview-top definition - https://phabricator.wikimedia.org/T184541#3896201 (10JAllemandou) Also submitted a PR to restbase: https://github.com/wikimedia/restbase/pull/941
[08:37:15] <wikibugs>	 10Analytics-Kanban, 10RESTBase-API, 10Patch-For-Review, 10Services (watching): Update AQS pageview-top definition - https://phabricator.wikimedia.org/T184541#3896202 (10JAllemandou) a:03JAllemandou
[08:54:59] <wikibugs>	 (03PS1) 10Joal: Add script for webrequest dataloss flase-positives [analytics/refinery] - 10https://gerrit.wikimedia.org/r/403891
[09:00:48] <elukey>	 hola
[09:07:45] <joal>	 Hi :)
[09:07:47] <elukey>	 !log reboot analytics1063->65 for kernel updates
[09:07:52] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:08:02] <elukey>	 so joal the labs cluster should be running java 8 :)
[09:08:13] <joal>	 elukey: it refines data :)
[09:08:21] <joal>	 elukey: slowly, but surely
[09:08:42] <elukey>	 for some weird reason that I don't know (probably a horrible bash inclusion) hive/spark/etc.. clients are all using java 8
[09:09:05] <elukey>	 I straced hive (client) and it indeed reads hadoop-env.sh
[09:09:13] <elukey>	 where JAVA_HOME is set
[09:09:39] <elukey>	 I'll try to figure out if this is expected or coincidence, buuut in the meantime we can do our tests
[09:09:53] <elukey>	 rollback will be simple: set JAVA_HOME in puppet, and that's it
[09:10:05] <joal>	 Awesome elukey :)
[09:10:16] <elukey>	 https://gerrit.wikimedia.org/r/#/c/403701/ if you want to check
[09:10:24] <elukey>	 (no-op for the moment)
[09:12:09] <joal>	 elukey: if you have a minute while following you reboots: https://gerrit.wikimedia.org/r/403891
[09:12:52] <joal>	 elukey: It's a copy from the one we had in oncall page, but the page has now been updated: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FTeam%2FOncall&type=revision&diff=1780311&oldid=1780238
[09:13:27] <wikibugs>	 (03CR) 10Elukey: [C: 031] "A great +1 :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/403891 (owner: 10Joal)
[09:14:02] <joal>	 Yay !
[09:14:16] <wikibugs>	 (03CR) 10Joal: [V: 032 C: 032] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/403891 (owner: 10Joal)
[09:25:22] <elukey>	 so joal, as far as I've understood, it seems that the labs cluster is working well with java 8 right?
[09:25:59] <joal>	 elukey: triple checking spark now, but YES!
[09:26:30] <elukey>	 very nice
[09:28:14] <joal>	 elukey: looks like spark is not connected to hive, but seems more a problem of cluster than java8
[09:29:48] <joal>	 elukey: 18/01/12 09:27:35 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
[09:30:11] <elukey>	 mmmm
[09:30:14] <elukey>	 yeah seems so
[09:30:21] <joal>	 elukey: but reading parquet works super nice
[09:32:11] <joal>	 elukey: here are the requests sent by ottomata to our test cluster: df.groupBy("uri_path").count().collect()
[09:32:14] <joal>	 res6: Array[org.apache.spark.sql.Row] = Array([/frog,410], [/halibut,360], [/apple,425], [/banana,396], [/donkey,369], [/,373], [/emu,384], [/cricket,368], [/giraffe,433])
[09:32:18] <joal>	 :D
[09:32:31] <elukey>	 hahahah
[09:32:57] <joal>	 elukey: spark2 tested - testing spark-1
[09:33:42] <joal>	 elukey: spark2 runs with java8, but it looks like spark1 runs with j7
[09:33:50] <joal>	 Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_151)
[09:36:14] <joal>	 Ohhh - Interesting - spark1 works well with hive but runs with j7, spark2 doesn't work well with hive and runs j8
[09:40:04] <elukey>	 so spark-shell vs spark2-shell right?
[09:40:23] <joal>	 correct elukey 
[09:42:24] <elukey>	 joal: what do you mean that spark1 works well with hive but not spark2 ?
[09:42:46] <joal>	 saprk.sql (in spark2), sqlContext.sql (in spark1)(
[09:43:03] <joal>	 I can make a query against hive metastore in s1, while it fails in s2
[09:44:34] <elukey>	 any specific error?
[09:45:05] <elukey>	 I have no idea if they source any config file for JAVA_HOME
[09:45:27] <joal>	 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
[09:45:31] <joal>	 in s2
[09:46:20] <elukey>	 wait I get the following for spark 1
[09:46:21] <elukey>	 18/01/12 09:41:21 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
[09:46:25] <elukey>	 but nothing for spark2
[09:46:35] <elukey>	 on hadoop-worker-2
[09:46:35] <joal>	 This was for s2
[09:47:11] <joal>	 ?
[09:47:15] <joal>	 Weird
[09:47:16] <elukey>	 yeah
[09:48:11] <joal>	 on hadoop-worker-1, I launched spark-shell --master yarn
[09:48:24] <joal>	 it tells me: Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_151)
[09:49:25] <elukey>	 ah no I haven't used --master yarn
[09:49:25] <joal>	 Which means spark1 is still using j7
[09:49:38] <elukey>	 so this might be the issue, let me retry
[09:49:49] <joal>	 elukey: maybe, but I don't really htink si
[09:50:07] <elukey>	 no I mean for the inconsistency in our results, they are flipped
[09:50:19] <elukey>	 anyhow, for spark1 it is a matter of setting JAVA_HOME properly
[09:50:21] <joal>	 elukey: :(
[09:51:30] <elukey>	 just tested export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre before spark-shell, it uses j8 as expected
[09:51:45] <elukey>	 hive for some reason sources hadoop-env.sh, and in there j8 is set
[09:52:28] <elukey>	 this is not a huge deal though, java 7 will be eventually removed from the cluster
[09:52:43] <elukey>	 so any client will pick up j8
[09:52:58] <joal>	 testing spark/hive connection with j8 and s1
[09:53:19] <joal>	 elukey: Got the same message you add about metastore connecttion issue when using j8 for spark1
[09:53:33] <joal>	 so metastore connection seems related to java version
[09:54:57] <joal>	 Actually the warning message didn't prevent for a request to be successful - spark1 succesfully connected to hive metastore to run a query
[09:54:58] <elukey>	 what user do you use for spark-shell? Yours?
[09:55:01] <joal>	 using java8
[09:55:06] <joal>	 yessir
[09:55:23] <elukey>	 because I get some access denied exceptions, mmm
[09:56:12] <elukey>	 Permission denied: user=elukey, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
[09:56:42] <joal>	 elukey: you haven't created your home folder in hdfs /user/elukey with rights o yourself
[09:56:50] <elukey>	 yep doing it :D
[09:57:08] <joal>	 elukey: Home, sweet home :)
[09:58:05] <joal>	 elukey: At least I get coherent results: spark2 with either j7 or j8 doesn't want to connect ot hive metastore (no error at launch, but error when trying a query)
[09:58:26] <joal>	 elukey: while it works in spark1, with either j7 or j8
[10:00:01] <elukey>	 I was expecting some weirdness joal, it was too easy :D
[10:00:08] <joal>	 hehe :D
[10:00:24] <joal>	 thing to ponder: I don't know if this was an issue before j8 or not
[10:04:30] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965#3676320 (10JAllemandou) Plenty of possible different ways here. Listing the two that makes most sense to me:   - Oozie style: add steps to the oozie mediawiki-reduced...
[10:04:49] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965#3896289 (10JAllemandou) a:03JAllemandou
[10:06:47] <elukey>	 ok joal now I am confused :D
[10:07:58] <joal>	 elukey: I have the feeling it's a metastore-version thingthink 
[10:09:39] <elukey>	 joal: what happens now on the prod cluster then?
[10:09:49] <elukey>	 do spark2-shell fail?
[10:09:53] <joal>	 elukey: it works
[10:10:02] <joal>	 eqi stat21004
[10:10:04] <joal>	 oops
[10:11:38] <elukey>	 so you are saying that this was an issue on the labs cluster before java 8
[10:11:46] <elukey>	 ok now I got it
[10:12:10] <joal>	 elukey: I'm saying I actually don't know if it was an issue before j8 - I didn't test sdpark before :(
[10:12:30] <elukey>	 yep yep I didn't get the part of "context==labs" 
[10:12:32] <elukey>	 :)
[10:12:40] <joal>	 Arf sorry - should be more explicit :)
[10:12:52] <elukey>	 nono I need coffee, grabbing some :)
[10:15:10] <elukey>	 (no coffee in the co-working, nuuuuooooo)
[10:15:37] <joal>	 :(
[10:33:32] <elukey>	 !log reboot analytics1066->69 for kernel updates
[10:33:38] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:33:39] <elukey>	 finally the last one :D
[10:33:42] <elukey>	 *ones
[10:37:58] <moritzm>	 the last of ones the hadoop cluster :-)
[10:47:01] <elukey>	 moritzm: comee oooonnn let me enjoy these temporary moments of happiness :D 
[10:47:29] <elukey>	 :D
[10:52:57] <moritzm>	 ok :-)
[10:57:20] <elukey>	 ok all the nodes (except an1003) have the new kernel
[10:57:30] <elukey>	 I need to schedule maintenance for all the stat boxes and an1003
[10:59:00] <moritzm>	 thanks. stat* boxes are already upgraded, BTW
[10:59:09] <elukey>	 super
[10:59:35] <elukey>	 moritzm: how about druid nodes and kafka[12]00[123] ?
[11:00:11] <elukey>	 and also eventlog1001 (but it runs trusty so not sure if the kernel is ready)
[11:01:11] <moritzm>	 druid*, kafka[12]00[123], aqs* and eventlog all the fixed kernels installed (the one for trusty has been released now)
[11:01:24] <elukey>	 very nice!
[11:01:25] <moritzm>	 they messed up their 4.4 builds, but doesn't apply to trusty
[11:02:34] <moritzm>	 I'm grinding through some others clusters, but can also help with other analytics reboots on Monday/Tuesday
[11:04:58] <elukey>	 I'll let you know if I need help, but it should be ok.. thanks!
[11:11:12] <moritzm>	 ok!
[11:37:28] <joal>	 Hey elukey - Where can I find the network.pp file (i'd like to update IP addresses in out refiniery-source codebase)
[11:39:26] <elukey>	 joal: I may need a bit more info.. what IP do you need to update and where in puppet? (sorry to ask but I don't have a lot of context)
[11:40:36] <joal>	 elukey: we reference internal IPs here: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/IpUtil.java
[11:40:41] <joal>	 elukey: I'd like to update them
[11:41:15] <joal>	 elukey: I found puppet:/manifest/realm.pp -- But it seems to contain only ipv4 values, no labs not v6 ones
[11:46:40] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3896456 (10elukey)
[11:48:12] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3659718 (10elukey) I opened https://phabricator.wikimedia.org/T184794 to track down and fix Oozie/Hive bugs, I am inclined to close this task since:  1)...
[11:49:53] <elukey>	 mmmm I don't really like this file, we might need to find a better solution
[11:50:07] <joal>	 elukey: I'd love to
[11:51:36] <elukey>	 so network::constants has moved but still in puppet, a bit different from the version that we use though
[11:52:36] <elukey>	 https://github.com/wikimedia/puppet/blob/production/modules/network/manifests/constants.pp
[11:55:02] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3896474 (10elukey) a:03elukey
[11:56:50] <joal>	 Bazinga elukey!! Many thanks :)
[11:57:33] <elukey>	 joal: one possible solution to the issue would be for puppet to create a file with the ips that you need
[11:57:41] <elukey>	 and then that class would pick them up
[11:57:53] <joal>	 elukey: that'd be super awesome
[11:58:39] <joal>	 elukey: I'd need two files: one with our external IPs (v4 and v6), one with our labs-internal IPs
[11:59:35] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Move AQS Cassandra daemons to use the Prometheus JMX agent - https://phabricator.wikimedia.org/T184795#3896483 (10elukey)
[12:00:12] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Add the prometheus jmx agent to AQS Cassandra - https://phabricator.wikimedia.org/T184795#3896483 (10elukey)
[12:34:53] <wikibugs>	 (03CR) 10Mforns: Add core class and job to import EL hive tables to Druid (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) (owner: 10Mforns)
[12:57:50] <wikibugs>	 10Analytics-Kanban: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3896604 (10JAllemandou)
[12:58:15] <wikibugs>	 10Analytics-Kanban: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3349143 (10JAllemandou) a:03JAllemandou
[12:58:33] <wikibugs>	 (03PS1) 10Joal: Refactor geo-coding function and add ISP [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907)
[13:02:50] <joal>	 !log Rerun webrequest-load-wf-upload-2018-1-12-9
[13:02:53] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:03:34] <joal>	 !log Rerun webrequest-load-wf-text-2018-1-12-9
[13:03:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:21:38] <elukey>	 thanks!
[13:43:10] <elukey>	 joal: fine if I reboot druid1004?
[13:43:22] <elukey>	 depooling it first from pybal/lvs
[14:01:05] * elukey coffee!
[14:25:31] <milimetric>	 grrrrrrrrr
[14:25:40] <milimetric>	 something's still wrong with this interlanguage job
[14:25:47] <milimetric>	 it's still not picking up all the data
[14:25:51] * milimetric hates oozie
[14:26:05] <wikibugs>	 (03CR) 10Fdans: [V: 032 C: 032] "Looks good to me!" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/403890 (https://phabricator.wikimedia.org/T184541) (owner: 10Joal)
[14:39:49] <wikibugs>	 10Analytics-Kanban: Sqoop cu_changes table for geowiki - https://phabricator.wikimedia.org/T184759#3896816 (10Milimetric)
[14:40:04] <milimetric>	 anyone wanna brain-bounce on this oozie problem?
[14:40:10] <milimetric>	 I don't get it
[14:40:24] <elukey>	 I am not sure I'd be of any help :(
[14:42:38] <mforns>	 milimetric, I can try, let me 5 mins to change rooms
[14:45:02] <fdans>	 joal: testing query to have iso code and country name in hive :)
[14:45:35] <joal>	 milimetric: I'll go to grab Lino soon, I'll have time post-standup if not yet fixed
[14:45:45] <milimetric>	 thx joal 
[14:46:03] <milimetric>	 np
[14:46:08] <milimetric>	 I'll look at it with mforns 
[15:05:10] <elukey>	 so spark2 seems the only thing that we are not able to run in the labs cluster, the rest works fine with java 8
[15:05:32] <elukey>	 (spark2 gets weird also with java 7 in labs so something is probably wrong in there)
[15:14:38] <fdans>	 mforns: nuria_ joal ohhh damn, I was under the impression that projectview_hourly stored ISO numbers (like spain => 724)
[15:14:54] <fdans>	 but it stores alpha codes like Spain => ES
[15:15:03] <fdans>	 so it's human readable anyway
[15:15:12] <fdans>	 no need to include full country names
[15:15:42] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Hadoop daemons - https://phabricator.wikimedia.org/T177458#3896880 (10elukey)
[15:19:48] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3896893 (10elukey) In https://oozie.apache.org/docs/4.1.0/AG_Install.html -> `Advanced/Custom Environment Settings` I can't see any CATALINA_OPTS listed...
[15:25:32] <milimetric>	 nuria_: this was the trick I forgot yesterday: SET hive.mapred.mode = nonstrict;
[15:25:37] <milimetric>	 (to make repair work)
[15:25:40] <milimetric>	 it works fine after that)
[15:46:48] <nuria_>	 milimetric: nice
[15:55:39] <milimetric>	 fdans: any disagreements on https://gerrit.wikimedia.org/r/#/c/402466/?
[15:55:56] <milimetric>	 did you already do the deploy yesterday without it?
[15:59:16] <wikibugs>	 (03CR) 10Faidon Liambotis: "Thank you *so* much for doing this! I don't have anything valuable to contribute, other than nitpicking: MaxMind capitalizes both Ms in th" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[16:02:44] <bmansurov>	 👋 I'd like to upload some data to https://analytics.wikimedia.org/datasets/archive/public-datasets/all/mwrefs/. Anyone can give directions on how to do so?
[16:15:01] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3896976 (10elukey) Finally found the root cause. Each time that oozied.sh does start/stop from the init.d's script it starts with a clean environment. Th...
[16:15:26] <elukey>	 milimetric: I hate oozie too --^
[16:15:29] <elukey>	 :D
[16:17:02] <milimetric>	 to be fair, your reasons are much more legitimate, elukey :)
[16:17:32] <elukey>	 those bash scripts are... 
[16:17:47] <elukey>	 ....
[16:17:47] <elukey>	 ...
[16:33:26] <wikibugs>	 (03CR) 10Nuria: Refactor geo-coding function and add ISP (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[16:54:36] <wikibugs>	 (03PS1) 10Milimetric: Correct the column order [analytics/refinery] - 10https://gerrit.wikimedia.org/r/403946
[16:54:47] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] Correct the column order [analytics/refinery] - 10https://gerrit.wikimedia.org/r/403946 (owner: 10Milimetric)
[17:00:02] <fdans>	 a-team I’ll be a few minutes late to standup, sorry
[17:01:32] <nuria_>	 ping ottomata[m] standup today?
[17:07:58] <wikibugs>	 10Analytics-Kanban: Sqoop cu_changes table for geowiki - https://phabricator.wikimedia.org/T184759#3897156 (10Nuria)
[17:35:44] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3897276 (10elukey) The problem seems to be in the oozie debian package itself:  ``` elukey@hadoop-coordinator-1:~/oozie-4.1.0+cdh5.10.0+389/debian$ grep...
[17:40:36] <wikibugs>	 10Analytics-Kanban: Sqoop cu_changes table for geowiki - https://phabricator.wikimedia.org/T184759#3897285 (10Milimetric) Some thoughts from post-standup:  * snapshot partition name doesn't apply to this use case, change it to like temporary or something like that * sqoop only one month of data * after processin...
[17:41:14] <milimetric>	 gonna go eat lunch, bbl
[17:50:51] <nuria_>	 mforns: there?
[17:58:35] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965#3897355 (10Nuria) +1 to @mforns comment  Let's talk about this on our next tasking meeting. I think the best option is the 1st one, so  we test validity of data close...
[18:00:35] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie - https://phabricator.wikimedia.org/T184794#3897369 (10elukey) Tried to open https://community.cloudera.com/t5/CDH-Manual-Installation/Oozie-duplicates-CATALINA-OPTS-variables-in-oozie-env-sh/m-p/6...
[18:02:24] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain difference in number of repositories when trying to manually exclude imported third party repositories - https://phabricator.wikimedia.org/T184420#3897372 (10Aklapper) Uhm, maybe my mind played a trick: What if those repos had no...
[18:02:49] * elukey off!
[18:04:08] <mforns>	 nuria_, yes! 'sup?
[18:04:56] <nuria_>	 mforns: for the EL to druid i am just going to look at it  a bit more  and maybe separate the indexation code from the rest so it can be used for other spark classes?
[18:05:08] <nuria_>	 mforns: does that seem Ok?
[18:05:20] <mforns>	 nuria_, the indexation code is already separate no?
[18:05:53] <mforns>	 there is the generic DataFrameToDruid module,
[18:06:05] <mforns>	 that can be used by any client
[18:06:33] <mforns>	 no? what do you mean otherwise?
[18:06:43] <nuria_>	 mforns: ah i see, that was the idea that was a base class to be used by all spark jobs?
[18:06:49] <mforns>	 yes
[18:07:16] <mforns>	 and the EventLoggingToDruid is the specific one, that handles the EL case
[18:07:54] <mforns>	 and passes a DataFrame to DataFrameToDuid
[18:08:31] <mforns>	 when I started this task, Andrew and I discussed and decided for this architecture
[18:10:42] <nuria_>	 ok i see
[18:10:57] <mforns>	 the bigger part of EventLoggingToDruid is parameter parsing
[18:11:11] <mforns>	 and also formatting the specific EL case into something generic, meaning:
[18:11:48] <mforns>	 - identifying dimensions and metrics, given a schema convention
[18:12:07] <mforns>	 - specifying which EL standard fields are to be blacklisted
[18:12:20] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain difference in number of repositories when trying to manually exclude imported third party repositories - https://phabricator.wikimedia.org/T184420#3897398 (10Aklapper) p:05High>03Low This.  There is still something fishy, but...
[18:12:39] <mforns>	 - and flattening the EventCapsule and other nested fields
[18:17:36] <mforns>	 nuria_, although flattening is a pretty much generic thing, that could be included in the core DataFrameToDruid, I decided to move it out into EventLoggingToDruid, because blacklisting and metric/dimension designation is highly coupled with flattening, and those need to happen in the specific EventLoggingToDruid
[18:32:21] <nuria_>	 ok  will loook at it a bit more to see if i have any useful suggestions, i moved the template to a resource file on my last patch but that was a 2 liner
[18:33:28] <mforns>	 yes, template in another file makes sense
[19:27:10] <joal>	 nuria_: Before starting to change, I double checked MaxMind database sizes - City is 130M, Country is 3.5M - I think this is the reason why we chose otiginally to provide the counry out of that specific database
[19:31:51] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965#3897645 (10JAllemandou) >>! In T177965#3897355, @Nuria wrote:  > I think warming up of cache should happen after in the AQS deployment step of this data.   Given we p...
[19:57:45] <wikibugs>	 (03PS2) 10Joal: Refactor geo-coding function and add ISP [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907)
[20:00:55] <wikibugs>	 (03CR) 10Joal: "@Nuria: I on purpose kept the MaxMindCountryCode class, allowing to get country with a way smaller amount of data loaded than if using Max" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[22:45:48] <marlier>	 Hey, folks.  Quick question: https://www.mediawiki.org/wiki/Extension:EventLogging/Guide seems to suggest that arrays are not valid data types.  Is that true?
[22:46:01] <marlier>	 If so, is there a generally accepted alternative?
[22:58:19] <wikibugs>	 10Analytics, 10MediaWiki-Releasing: Create dashboard showing MediaWiki tarball download statistics - https://phabricator.wikimedia.org/T119772#3898072 (10demon) p:05Triage>03Normal