[05:24:50] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Minor but important nit: We should use the MediaWiki.* namespace for metrics send from outside MediaWiki. This namespace is prepended by M" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392703 (https://phabricator.wikimedia.org/T176785) (owner: 10Joal)
[05:25:16] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Correction: s/We should use/We should not use/" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392703 (https://phabricator.wikimedia.org/T176785) (owner: 10Joal)
[05:27:13] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3780217 (10Krinkle) >>! In T176785#3777250, @JAllemandou wrote: > The existing metric is named `restbase.requests.varnish_requests`, and we plan to go with...
[08:58:33] * elukey afk for a bit!
[11:34:59] <elukey>	 the first run of the eventlogging cleaner as cron is executing now \o/
[11:42:40] <joal>	 Yay elukey :)
[12:08:03] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3780901 (10JAllemandou) @Krinkle: I'm happy to use a less functional approach, and have for instance `analytics.varnish_requests.restbase` and `analytics.v...
[12:18:48] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3780948 (10Pchelolo) > Only concern is that it changes the existing metric for restbase. @mobrovac and @Pchelolo, is that a big deal?  That will not be des...
[12:24:07] <wikibugs>	 (03CR) 10Addshore: [C: 04-1] "*agrees with Krinkle*" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392703 (https://phabricator.wikimedia.org/T176785) (owner: 10Joal)
[12:51:51] * elukey coffee
[12:56:36] <joal>	 Taking a b
[12:56:41] <joal>	 break a-team
[14:12:27] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Advertise wikimedia.biterg.io more widely in the Wikimedia community - https://phabricator.wikimedia.org/T179820#3781299 (10Aklapper)
[14:41:11] <elukey>	 joal: whenever you are back I'd like to ask you some questions about how to restart druid daemons on druid100[123]
[14:41:18] <elukey>	 (to enable prometheus metrics)
[14:54:47] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: Purge refined JSON data after 90 days - https://phabricator.wikimedia.org/T181064#3781379 (10Ottomata) Naw, tbayer.popups isn’t gonna be touched.
[15:04:44] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Check analytics1037 power supply status - https://phabricator.wikimedia.org/T179192#3781423 (10elukey) ping :)
[15:25:18] <elukey>	 ottomata: o/
[15:27:49] <joal>	 Heya elukey - In da cave for ops sync
[15:28:02] <elukey>	 joining in a minute
[15:31:19] <ottomata>	 coming!
[15:50:19] <wikibugs>	 10Analytics-Kanban: Procure hardware to refresh jupyter notebooks - https://phabricator.wikimedia.org/T175603#3781547 (10Ottomata) a:03Ottomata
[15:52:21] <wikibugs>	 10Analytics, 10Analytics-Cluster: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3781548 (10Ottomata) a:05Ottomata>03elukey
[15:52:48] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3781549 (10elukey)
[15:52:50] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Upgrade Analytics Cluster to Java 8 - https://phabricator.wikimedia.org/T166248#3781550 (10Ottomata)
[16:08:07] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Check analytics1037 power supply status - https://phabricator.wikimedia.org/T179192#3781597 (10RobH) I think replacing bad powersupplies on out of warranty servers is likely a waste of money (as other parts will also go bad with older systems), however...
[16:09:18] <ottomata>	 !log restarting eventlogging services on eventlog1001
[16:09:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:13:30] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Spike: SPIKE: experimenting with importing Revision history from XML dumps into an easier to use format, Avro - https://phabricator.wikimedia.org/T78405#844429 (10JAllemandou) Some code exists for converting XML dump in hadoop: https://gerrit.wikimedia.org/r/#/c/361440/ It...
[16:18:49] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Spike: SPIKE: experimenting with importing Revision history from XML dumps into an easier to use format, Avro - https://phabricator.wikimedia.org/T78405#3781633 (10JAllemandou) @ggellerman : I think we can close this task, just want to confirm with you.
[16:20:15] <wikibugs>	 10Analytics, 10Analytics-Dashiki: Browser-major numbers represented as a percentage (when they're not) on the desktop tabular view - https://phabricator.wikimedia.org/T181164#3781638 (10Jdforrester-WMF)
[16:47:04] <elukey>	 joal: druid restarts completed!
[17:02:46] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3781820 (10JAllemandou) @Krinkle and @Addshore: After looking once more in graphite, would `mw.api.varnish_requests` be ok for you guys? This would allow u...
[17:06:28] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3781829 (10Addshore) As far as I am aware, for those with access to the graphite box it is fairly trivial to move and merge metrics.  "analytics.varnish_re...
[17:07:05] <ottomata>	 elukey:  apparently puppet linter makes you do class { '::standard': } etc. for non profile classes
[17:07:11] <ottomata>	 ¯\_(ツ)_/¯
[17:08:02] <elukey>	 yeah I suspected that :(
[17:24:43] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3781893 (10Anomie) I'm a bit confused here. Is this task about adding a count for the action API, turning an existing restbase-only count into a restbase+a...
[17:25:39] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Spike: SPIKE: experimenting with importing Revision history from XML dumps into an easier to use format, Avro - https://phabricator.wikimedia.org/T78405#3781894 (10ggellerman) @JAllemandou Thanks for asking!  However, I defer to @Ottomata  on this question.
[17:27:46] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Spike: SPIKE: experimenting with importing Revision history from XML dumps into an easier to use format, Avro - https://phabricator.wikimedia.org/T78405#3781924 (10Ottomata) 05Open>03Resolved a:03Ottomata
[17:34:46] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3781971 (10Pchelolo) >>! In T176785#3781893, @Anomie wrote: > I'm a bit confused here. Is this task about adding a count for the action API, turning an exi...
[17:35:49] <addshore>	 ottomata: oooooh, statistics::explorer
[17:36:14] <ottomata>	 shhhh thank luca for that name
[17:36:21] <ottomata>	 these stat classes are sooo baad
[17:36:23] <ottomata>	 very hard to work with
[17:36:27] <ottomata>	 priority to fix is very low
[17:36:29] <ottomata>	 yaaaarrr
[17:37:14] <addshore>	 :D
[17:37:25] <addshore>	 I was trying to find an appropriate explorer meme, but apparently they ar eall rubbish
[17:37:27] <addshore>	 good name!
[17:45:54] * elukey off!
[18:42:19] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: A statbox to update the WDCM system - https://phabricator.wikimedia.org/T181094#3782283 (10Ottomata) OK!  I've installed R and a mysql .conf pw file on stat1004.  If you are in the analytics-privatedata-users group on...
[18:42:44] <wikibugs>	 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: A statbox to update the WDCM system - https://phabricator.wikimedia.org/T181094#3782284 (10Ottomata)
[18:43:12] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering: Install R on stat1004? - https://phabricator.wikimedia.org/T174890#3782286 (10Ottomata)
[18:43:15] <wikibugs>	 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: A statbox to update the WDCM system - https://phabricator.wikimedia.org/T181094#3779313 (10Ottomata)
[18:46:40] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review, 10Services (watching): Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3782292 (10JAllemandou) @Pchelolo is correct, idea is to have both restbase and mw-action-api hourly varnish-requests counts in graphite.
[18:48:28] <joal>	 A-team, with your approval, I'll take friday afternoon to go to a meetup on warp10 (http://www.warp10.io/)
[18:49:05] <joal>	 warp10 is developped 10 minutes from home, and seems very powerfull when it comes to huge volume of timeseries
[18:49:21] <joal>	 (OVH uses it for its metrics AFAIK)
[19:08:01] <wikibugs>	 10Analytics-Kanban, 10Operations, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782420 (10Ottomata) p:05Triage>03Normal
[19:12:10] <wikibugs>	 10Analytics-Kanban, 10Operations, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782444 (10RobH) A few questions:  * Do you have a minimum clock speed you need on the CPU, or just whatever the best price point is? ** We...
[19:16:18] <ottomata>	 nuria_: , yt?
[19:16:28] <nuria_>	 ottomata: yessir
[19:16:35] <ottomata>	 notebook hw refresh
[19:16:40] <ottomata>	 am assuming we don't need much disk space
[19:16:48] <ottomata>	 but maybe we do?  do we want to support folks saving large datasets there?
[19:20:47] <nuria_>	 ottomata: i think that will fire back as we cannot really support processing for n people for largedatasets
[19:20:51] <nuria_>	 ottomata: right?
[19:21:05] <nuria_>	 ottomata: so probably best to store and process large datasets in hadoop 
[19:21:23] <ottomata>	 agree, but the default storage if we don't ask is 200G
[19:21:25] <ottomata>	 which maybe is enough?
[19:21:34] <ottomata>	 just dont' want to get to a spot where people fill up the disks
[19:21:46] <ottomata>	 i think we can afford larger HDDs if we wanted them
[19:21:48] <nuria_>	 ottomata: how large are notebooks? let me see, i have only done a couple
[19:22:34] <ottomata>	 yeah, i guess its better not to have large files there, that would encourage people to do big data processing in the notebook process itself, which is not what we want
[19:22:45] <nuria_>	 ottomata: ya that will go kaput
[19:22:49] <mforns>	 joal, going to warp10 seems cool to me! :]
[19:22:53] <nuria_>	 notebooks are tiniy
[19:23:11] <nuria_>	 ottomata: mine is not even  500k
[19:23:20] <nuria_>	 ottomata: so i'd say 200G is pelnty
[19:23:23] <nuria_>	 *plenty
[19:23:54] <joal>	 Oh by the way ottomata -- I've tested zeppelin for spark - it;s really great :)
[19:24:00] <ottomata>	 oh ya?
[19:24:23] <joal>	 ottomata: if we manage to have spark working in jupyter, no need, but if we don't, zeppelin would be good
[19:24:25] <ottomata>	 ok nuria, we'll go with the 200Gs
[19:24:41] <wikibugs>	 10Analytics-Kanban, 10Operations, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782479 (10Ottomata) > Do you have a minimum clock speed you need on the CPU, or just whatever the best price point is? No, but as always,...
[19:25:41] <joal>	 Thanks mforns for the +1 :) I also like the idea of learning some on that thing
[19:26:30] <ottomata>	 joal:  sounds maybe a little like Would this benefit from SSDs? 
[19:26:31] <ottomata>	 oop
[19:26:34] <ottomata>	 like
[19:26:35] <ottomata>	 https://www.timescale.com/
[19:27:24] <ottomata>	 nuria_:  other things.  superset.  this one is really tricky.
[19:27:41] <ottomata>	 deb package would be very hard, as there are a lot of non debianized dependencies
[19:27:45] <nuria_>	 ottomata: i have it on my list of goals-to-talk, why is it tricky
[19:27:52] <nuria_>	 ah sorry
[19:27:53] <ottomata>	 python deployment is not easy
[19:28:06] <ottomata>	 maybe i should see how halfak|Lunch is doing his ores stuff
[19:28:08] <ottomata>	 and try to emulate
[19:28:09] <nuria_>	 ottomata: and making it all (package and deps) 
[19:28:17] <nuria_>	 one debian?
[19:28:19] <ottomata>	 joe has a proposal for python packaging deployment in the work
[19:28:22] <joal>	 ottomata: from what I have read on warp10 and what I see on timescale, they are a bit different )
[19:28:31] <ottomata>	 nuria_:  not really sure how do to that without virtualenv...
[19:28:36] <ottomata>	 hmm
[19:28:38] <ottomata>	 maybe...
[19:28:47] <nuria_>	 ottomata: fat -jar style
[19:28:57] <ottomata>	 yeah, but pythonpaths are weirder than nodes ones.
[19:29:04] <ottomata>	 it might be possible
[19:29:23] <nuria_>	 ottomata: ya python paths not so hot
[19:29:40] <nuria_>	 ottomata: this sounds like a good candidate for tasking
[19:29:46] <ottomata>	 hmm https://wikitech.wikimedia.org/wiki/ORES/Deployment
[19:30:04] <ebernhardson>	 joal: i just double checked, i seem to be able to run pyspark just fine on SWAP
[19:30:37] <nuria_>	 ottomata: that is a node style depployment for python right?
[19:30:45] <nuria_>	 ottomata: they are deploying build source
[19:30:51] <ottomata>	 ebernhardson:  cooooool, connecting to yarn though?
[19:30:55] <ottomata>	 ebernhardson: , can you do
[19:30:58] <ottomata>	 pyspark --master yarn
[19:30:58] <ottomata>	 ?
[19:30:59] <joal>	 ebernhardson: do you mind sharing your comand line?
[19:31:16] <ebernhardson>	 joal: no command line, just straight jupyter. one sec
[19:32:14] <nuria_>	 ottomata: or maybe i am not getting it
[19:32:33] <nuria_>	 ottomata: but they are deploying  python like we deploy aqs
[19:32:37] <nuria_>	 ottomata: eight?
[19:32:42] <nuria_>	 ottomata: sorry, right?
[19:32:52] <ebernhardson>	 joal: hmm, so https://phabricator.wikimedia.org/P6368 but i just double checked and thats running spark locally not against yarn. Can probably make it work against yarn though checking
[19:33:03] <ebernhardson>	 (but still able to access hdfs and such)
[19:35:08] <ottomata>	 nuria_:  no, because virtualenv paths are hardcoded
[19:35:12] <ottomata>	 you can't just move virtualenvs around
[19:35:21] <ottomata>	 they use a virtualenv to create a wheel (?) and then deploy that (?)
[19:35:21] <ottomata>	 i think
[19:35:23] <ottomata>	 i'm going to try
[19:35:38] <nuria_>	 ok
[19:35:55] <halfak>	 Who is looking at ORES config?
[19:36:38] <ottomata>	 halfak:  me
[19:36:42] <ottomata>	 we want to deploy a python thing
[19:36:50] <ottomata>	 not our code
[19:36:55] <ottomata>	 just a service that we need to package and deploy
[19:36:59] <ottomata>	 but there are a lot of python deps
[19:37:05] <ottomata>	 so .deb package is not looking fun
[19:37:11] <ottomata>	 i don't know much about wheels...
[19:39:55] <ebernhardson>	 joal: hmm, trying to run against yarn gets stuck in the Accepted stage in yarn :S i'll have to spend more time with it to figure out why, but spark local works which might be enough depending on what people are doing.
[19:40:13] <ottomata>	 aye, i think i remember there being some problem lik ethat
[19:40:16] <ottomata>	 don't remember why atm though
[19:40:21] <joal>	 Thanks ebernhardson for the check :)
[19:40:37] <joal>	 ebernhardson: How do you try to launch spark within yarn in notebook?
[19:40:46] <ebernhardson>	 joal: conf.setMaster('yarn-client')
[19:40:52] <ebernhardson>	 (before creating the SparkContext)
[19:40:57] <joal>	 Right
[19:41:01] <joal>	 Makes sense
[19:47:52] <wikibugs>	 (03PS18) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414)
[19:48:43] <joal>	 ebernhardson: May I kill your non-running pusaprk-apps ?
[19:48:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) (owner: 10Mforns)
[19:50:44] <wikibugs>	 (03PS19) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414)
[19:50:53] <ottomata>	 wow so halfak, do you recreate a new virtualenv from deployed wheels every time you deploy?
[19:50:55] <ottomata>	 just looked at 
[19:51:02] <ottomata>	 scap/cmd_worker.sh
[19:54:40] <ebernhardson>	 joal: yes definately
[19:55:12] <joal>	 ebernhardson: They dies by themselves :)
[19:56:12] <ottomata>	 hmm, maybe i can make a .deb package from these wheels.
[19:56:36] <ottomata>	 hmm, no, 
[19:56:37] <ottomata>	 hm no
[20:00:20] <ebernhardson>	 ottomata: the virtualenv sorta gets recreated each time. You can also look at https://github.com/wikimedia/labs-striker-deploy and https://github.com/wikimedia/search-MjoLniR-deploy as other python deployments that were also based on ORES work
[20:00:37] <halfak>	 ottomata, that's right. 
[20:00:58] <halfak>	 the venv recreation doesn't take that long with wheels. 
[20:01:01] <halfak>	 They are pre-compiled. 
[20:01:09] <ottomata>	 aye
[20:01:11] <halfak>	 It's just a matter of decompressing a bunch of zip files. 
[20:01:20] <halfak>	 wheel == zip with a specific structure
[20:01:32] <ottomata>	 is there a reason you don't commit and deploy the virtualenv itself?  if you can guaruntee that it will be deployed to the exact same path?
[20:01:35] <ottomata>	 where it was built?
[20:09:29] <ottomata>	 halfak: ^?
[20:22:48] <joal>	 mforns: interesting findings after dumps computation 1
[20:23:01] <mforns>	 ahaaaa
[20:23:11] <joal>	 mforns: batcave for a minute?
[20:23:12] <mforns>	 what did you find?
[20:23:15] <mforns>	 sure
[20:40:34] <Shilad>	 Hi Joal, Are you around? Seeking your advice on a few things...
[20:40:44] <joal>	 Hi Shilad ! Here I am :)
[20:40:50] <joal>	 Shilad: batcave, or IRC?
[20:40:59] <Shilad>	 IRC okay, I think!
[20:41:03] <joal>	 okey
[20:41:26] <joal>	 I wonder Shilad which is better: okay or okey - probably a matter of accents :)
[20:41:50] <Shilad>	 Hah! Q1: I am on the very last Spark job (Hooray!) that converts pageview session logs to concepts.
[20:42:13] <Shilad>	 Option 1: Enhance session hive schema to store both local page ids and wikidata concepts.
[20:42:53] <Shilad>	 Option 2: Mirror the language-specific schema with a duplicate where page ids are replaced by entity ids.
[20:43:09] <Shilad>	 I like option 1, even though it makes the pageview processing more complex.
[20:43:14] <Shilad>	 Does that seem reasonable?
[20:43:34] <joal>	 hm - I need to think a bit
[20:43:43] <Shilad>	 Better in batcave?
[20:43:48] <joal>	 Shilad: I think that intuitively I prefer option 2, but I can't really say why
[20:44:16] <joal>	 Shilad: possibly because it makes each of the dataset easier to use
[20:44:39] <joal>	 Oh, ans also I think becasue you won't have concepts while building sessions, would you?
[20:44:49] <Shilad>	 I could have them, I think.
[20:44:58] <joal>	 Ah, different matter then
[20:45:37] <Shilad>	 I would probably to a broadcast join, and it's a big datastructure (1.5GB by my accounting).
[20:45:45] <joal>	 In option 1 we would have a table with 3 columns? [timestamp, page_ids, concepts], where page_ids and concepts are arrays of the same size?
[20:46:52] <Shilad>	 RIght now it is:  pages ARRAY<STRUCT<page_id:INT, namespace_id:INT, seconds_after_start:INT>
[20:47:04] <Shilad>	 I would do what you are saying... add concept ids.
[20:47:22] <Shilad>	 and it would be the same size.
[20:48:04] <joal>	 Shilad: or put the concept_id as part of the struct in array, could make sense as well
[20:48:24] <Shilad>	 Right! Sorry, that's what I was thinking.
[20:48:30] <joal>	 :)
[20:48:38] <joal>	 Ok, I think that;s a good idea :)
[20:49:11] <Shilad>	 Great! Do you think a 1.5GB datastructure is too brutal to broadcast? 
[20:49:12] <joal>	 Shilad: It'll push our users of sessions to think of them in term of wikidata concepts in addition to page_ids :)
[20:50:07] <joal>	 Shilad: I've not done that much broadcasting, so I don't have strong opinion here - I think it;s big, but if using executor settings wisely (4 cores and 16G RAM for instance, could work like a charm)
[20:51:05] <Shilad>	 This is what I'm thinking too... I'm not sure about broadcast semantics. Do you know if Spark broadcasts once per partition, or once per executor? I presume it's smart enough to only do once per partition.
[20:51:16] <Shilad>	 Sorry... I mean once per executor.
[20:51:29] <Shilad>	 So if there is 1000 partitions but only 32 executors, it's only 32 broadcasts.
[20:51:58] <joal>	 Shilad: I'm pretty sure it does it the way you suge
[20:52:01] <joal>	 suggest sorry
[20:52:17] <Shilad>	 Then it shouldn't be too bad.
[20:52:57] <Shilad>	 FYI, I have actually built the sitelinks table: shilad.sitelinks if you want to take a look. Has page information before and after redirects, and also the wikidata entities.
[20:53:26] <joal>	 That's really great Shilad :)
[20:53:30] <Shilad>	 Okay, last question: This is a big CL! Would it be better to pull all of it out into its own project?
[20:53:56] <joal>	 Shilad: I'm struggling for the moment with an issue that takes most of my time, but I'll be glad to help moving this to production when I'm done :)
[20:54:29] <Shilad>	 Thanks. I understand. When that happens, do you still think it's best for it to stay in analytics-refinery despite the size?
[20:55:07] <joal>	 Shilad: I like it staying in refinery, it's where all that type of code belongs
[20:55:42] <joal>	 Shilad: I suggest having 2 packages: pageviewsessions, wikidataentities
[20:55:48] <joal>	 does it make sense o you?
[20:56:02] <joal>	 Also, I'm terrible with names, we could probably do better
[20:57:29] <Shilad>	 Yes! I did that but called them slightly different things: pagesessions and sitelinks
[20:57:43] <Shilad>	 Totally happy to change them if you prefer pageviewsessions and wikidataentities
[20:57:48] <joal>	 do you mind adding wikidata.sitelinks to the first one ?
[20:57:56] <joal>	 last one sorryb
[20:57:59] <joal>	 late here ;)
[20:58:55] <Shilad>	 Absolutely. And my questions are done :) Thanks for your help on this!
[20:59:07] <Shilad>	 Sorry for intruding on your evening.
[20:59:12] <joal>	 Shilad: no problem at all, thank you again !
[20:59:27] <joal>	 Shilad: I'm actually still on my issue taking most of my time, nothing to worry ;)
[20:59:54] <Shilad>	 Hah. That does not sound fun. But maybe it is?!
[21:00:07] <Shilad>	 Have a good night!
[21:00:43] <joal>	 Shilad: it's long and complex, but not that bad - just too long :)Have a good end of day too - 
[21:06:51] <joal>	 mforns, ottomata --> IT WORKED !!!! 0.02% diff for all times by faking correct reconstruction with 2017-09 snapshot
[21:07:04] <mforns>	 wooohooo
[21:07:25] <joal>	 Ok - Now that I have it, it's just the matter of correctly patching and ask Erik about his discrepeny
[21:07:30] <joal>	 I'm really happy :)
[21:07:40] <mforns>	 you rock joal nice job
[21:09:50] * joal goes to bed, moving like that http://reactiongifs.me/wp-content/uploads/2014/01/jimmy-fallon-elmo-happy-dance-saturday-night-live.gif
[21:11:48] <joal>	 Bye a-team, see you tomorrow !
[21:14:18] <ottomata>	 byyye
[21:14:23] <ottomata>	 joal AMAAAAING
[21:17:35] <wikibugs>	 (03PS1) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689)
[21:20:06] <ebernhar|lunch>	 fwiw i double checked why notebook1001 couldn't connect to yarn for spark, pretty sure its the same firewall that prevented 1005: https://phabricator.wikimedia.org/T170496
[21:23:35] <joal>	 ebernhar|lunch: pyspark working in jupyter is really a fantastic news :)
[21:23:46] <joal>	 ebernhar|lunch: Many many thanks :)
[21:24:29] <ebernhar|lunch>	 joal: np!
[21:28:52] <joal>	 ottomata: maybe if we have some time, we could test that as well: https://toree.apache.org
[21:29:44] <ottomata>	 hm
[21:32:17] <nuria_>	 joal jaja ! many thnaks for your work
[21:32:43] <joal>	 nuria_: I plan to document my findings tomorrow in ticket
[21:33:07] <nuria_>	 joal: now some <beverage of choice> might be in order
[21:33:14] <joal>	 huhu
[21:39:12] <wikibugs>	 (03CR) 10Nuria: Spark job to create page ids viewed in each session with both language specific ids and wikidata data ids. (031 comment) [analytics/refinery/source] (nav-vectors) - 10https://gerrit.wikimedia.org/r/383761 (https://phabricator.wikimedia.org/T174796) (owner: 10Shilad Sen)
[21:43:45] <wikibugs>	 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3782735 (10Nuria) The space should include the following members: members of analytics team @mforns @Ottomata @Milimetric @JAllemandou @elukey @fdans plus @Nuria plus CTO: @VCole...
[22:01:50] <wikibugs>	 (03CR) 10Ottomata: "Eric, cause I'm new at this python wheels thing...what do you think?" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[22:06:36] <wikibugs>	 (03CR) 10Nuria: Update cassandra load jobs to local quorum write (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392624 (owner: 10Joal)
[22:11:38] <wikibugs>	 10Analytics, 10DBA, 10Operations, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3782789 (10Nuria) @Nettrom Right, is not only that dashboard but all the ones that are feed data via reportupdater that needed the new configura...
[22:22:59] <wikibugs>	 (03CR) 10EBernhardson: "overall this looks reasonable. A couple suggestions but not really required. You might also want to see the RFC giuseppe is working on: ht" (033 comments) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[22:33:18] <wikibugs>	 (03CR) 10EBernhardson: Deploy repo for superset (python) (031 comment) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[23:15:38] <wikibugs>	 (03CR) 10Nuria: Grow RestbaseMetrics spark job to count MW API (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/392700 (https://phabricator.wikimedia.org/T176785) (owner: 10Joal)
[23:48:13] <wikibugs>	 (03PS20) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414)
[23:52:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) (owner: 10Mforns)