[04:29:12] <wikibugs>	 (03CR) 10Krinkle: "Thanks!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria)
[08:48:48] <elukey>	 hello people
[08:49:08] <elukey>	 just checked eventlogging and all the processors stopped dying after the deployment \o/
[10:26:39] * elukey coffee! brb
[10:54:08] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3985053 (10elukey)
[10:55:04] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Verify duplicate entry warnings logged by the m4 mysql consumer - https://phabricator.wikimedia.org/T185291#3912027 (10elukey) After the deployment I can't see any more duplicate errors in the m4-consumer log and no trace of p...
[12:00:01] * elukey lunch!
[12:10:17] <wikibugs>	 (03PS10) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.1.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207
[15:03:02] <elukey>	 first version of https://grafana-admin.wikimedia.org/dashboard/db/kafka-consumer-lag ready!
[15:03:16] <elukey>	 (without the -admin)
[15:04:11] <elukey>	 ours would be https://grafana-admin.wikimedia.org/dashboard/db/kafka-consumer-lag?orgId=1&from=now-6h&to=now&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=eqiad
[15:44:24] <wikibugs>	 10Analytics, 10TCB-Team, 10Two-Column-Edit-Conflict-Merge, 10WMDE-Analytics-Engineering, and 5 others: How often are new editors involved in edit conflicts - https://phabricator.wikimedia.org/T182008#3986038 (10GoranSMilovanovic) - This will be solved from the [[ https://meta.wikimedia.org/wiki/Schema:Edit...
[15:52:08] <wikibugs>	 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Stream: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#3986057 (10Ottomata) Is there a reason we want to do this on main instead of jumbo?  Stas will be consuming from jumbo, since it...
[16:01:14] <ottomata>	 fdans:  GET IN THE STANDUP YOU BETTER WATCH OUT I'M IN CHARGE NOW
[16:01:28] <fdans>	 oh no
[16:01:56] <wikibugs>	 (03PS11) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.1.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207
[16:02:30] <joal>	 ottomata: you are frozen
[16:04:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Upgrade scala to 2.11.7 and Spark to 2.1.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207 (owner: 10Joal)
[16:12:34] <joal>	 ottomata: Mwarf :( Any idea on how to make jenkins give more memory to that docker? https://integration.wikimedia.org/ci/job/analytics-refinery-maven-java8-docker/5/console
[16:14:48] <wikibugs>	 (03PS1) 10Mforns: [WIP] Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064)
[16:32:25] <ottomata>	 joal:  no, ask hashar?
[16:49:22] <wikibugs>	 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10Patch-For-Review, and 2 others: Export burrow metrics to prometheus - https://phabricator.wikimedia.org/T180442#3758088 (10elukey) So summary:  - we have https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag up and running, but main-codfw is not showi...
[16:54:13] <joal>	 ottomata: here is the code to make presto work in slider: 
[16:54:20] <joal>	 ottomata: https://github.com/prestodb/presto-yarn
[16:56:39] <wikibugs>	 10Analytics, 10Analytics-Wikistats: data missmatch for number of editors - https://phabricator.wikimedia.org/T187806#3986460 (10Lydia_Pintscher)
[17:01:09] <wikibugs>	 10Analytics, 10Analytics-Wikistats: data missmatch for number of editors - https://phabricator.wikimedia.org/T187806#3986460 (10JAllemandou) @Lydia: You should split by editor type. The editors you are talking about are, I think what we call in Wikistats 2 `registered-users editors`. Please let us if I'm wrong!
[17:29:18] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Data mismatch for number of Wikidata editors - https://phabricator.wikimedia.org/T187806#3986662 (10Nemo_bis)
[17:38:09] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Data mismatch for number of Wikidata editors - https://phabricator.wikimedia.org/T187806#3986460 (10Nemo_bis) Note that the only official number for "active editors" is the 5+ edits/month editors, which is not (easily) provided by WikiStats v2.  To get a (theoretically) comp...
[17:40:27] <wikibugs>	 (03PS12) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.1.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207
[17:45:20] <wikibugs>	 10Analytics-Kanban, 10User-Elukey: Reduction of stat1005's disk space usage - https://phabricator.wikimedia.org/T186776#3986748 (10elukey)
[17:47:09] <elukey>	 ottomata: thoughts about where/how to put the stat1002-a dir in hdfs?
[17:47:20] <elukey>	 (from stat1005)
[17:48:09] <ottomata>	  /wmf/data/archive/backup ?
[17:49:48] <elukey>	 could be it yes
[17:50:53] <elukey>	 ahh and using something like copyFromLocal I can avoid to compress the dir right? 
[17:51:02] <elukey>	 (we have only 300GB left)
[17:56:58] <wikibugs>	 10Analytics, 10InteractionTimeline, 10Anti-Harassment (AHT Sprint 15): Measure how many unique people visit the Timeline - https://phabricator.wikimedia.org/T187374#3986831 (10dbarratt) @Nuria Could you create a new project/account for the #interactiontimeline ? If you send me the JS code I should be able to...
[18:15:44] <elukey>	 ottomata: something like sudo -u hdfs -copyFromLocal /srv/stat1002-a /wmf/data/archive/backup/stat_hosts ?
[18:16:21] <elukey>	 hdfs might not be able to read stat1002-a's files though
[18:21:07] <ottomata>	 ?
[18:21:30] <ottomata>	 elukey maybe /wmf/data/archive/backup/misc/stat1002-a (and then a README in stat1002-a)
[18:21:33] <ottomata>	 ?
[18:23:30] <elukey>	 ok I am not going to argue names, but is the command correct? Never done it
[18:25:22] <ottomata>	 looks ok i think!
[18:25:24] <ottomata>	 can't hurt to try
[18:26:12] <elukey>	 sure
[18:27:48] <elukey>	 started, it might take a whie
[18:27:49] <elukey>	 *while
[18:31:23] <elukey>	 got some perms denided
[18:31:49] <ottomata>	 ohh interesting, right, probably because hdfs user can't read everyting in /srv
[18:31:51] <ottomata>	 ohHH that's what you mean
[18:31:53] <ottomata>	 sorry didnt' understand
[18:31:54] <elukey>	 yeah
[18:32:10] <elukey>	 well nothing super big, I'll amend whatever will get perms denied
[18:32:11] <ottomata>	 yeah, elukey you could  put in /user/root/ in hdfs first
[18:32:12] <ottomata>	 with sudo
[18:32:20] <ottomata>	 and then hdfs dfs -mv into backup/ after
[18:32:47] <elukey>	 /wmf/data/archive/backup/misc/stat1002-a/discovery/golden-dev/test_2017-03-24_17:39:10.log.md is not a valid DFS filename
[18:32:50] <elukey>	 ah lol
[18:33:55] <elukey>	 all right, will check tomorrow morning.. ottomata if you have time would you mind to take a look? So we'll remove /srv/stat1002-a asap
[18:34:13] <elukey>	 task is https://phabricator.wikimedia.org/T186776
[18:34:20] <elukey>	 otherwise I'll try to work on it tomorrow
[18:35:01] <ottomata>	 ok
[18:35:08] <ottomata>	 take a look to see if it is done?
[18:35:10] <ottomata>	 you mean elukey?
[18:35:21] <bearloga>	 nuria_ joal ottomata: hopefully I didn't cause any problems with https://phabricator.wikimedia.org/T184095#3986912 if I did I am so sorry
[18:36:56] <elukey>	 ottomata: nono my copyFromLocal failed due to the weird name, so it should be restarted
[18:37:15] <elukey>	 maybe excluding/renaming those weird filenames
[18:37:57] <elukey>	 anyhow, going afk :)
[18:38:01] * elukey off!
[19:00:12] <DarTar>	 yo joal: a 3rd person asked me about clickstream for Wikidata, I am not sure how easy it to generate something similar to the WP datasets or if you guys have thought of this
[19:08:21] <joal>	 DarTar: I have ideas around what this means, but this is very much not "clickstream for wikidata" I have in mind - I'm thinking more of: clickstream cross-language (tagged by language) between wikidata entitiesb
[19:08:32] <joal>	 bearloga: I'll have a look at your ticket
[19:08:42] <joal>	 bearloga: I'm pretty sure you didn't break anything
[19:08:48] <joal>	 bearloga: When creatin
[19:08:51] <joal>	 again sorry
[19:09:30] <joal>	 bearloga: When creating a table in hive, file format is an important improvement- I'm assuming you'll be interested in analytics-oriented queries over your data, therefore you should use parquet
[19:09:57] <joal>	 bearloga: We have a ticket to make parquet the default file format for hive table creation, but our hive version doesn't allow for it as of now
[19:13:45] <joal>	 bearloga: something else I'm tinking of: you extract distinct  - wouldn't it be interesting to keep the number of actions?
[19:16:27] <bearloga>	 joal: thanks! I should modify the query and rerun. although I'm having a hard time tracking down the correct way to specify the parquet file format (in the CREATE TABLE statement vs SET???); and yeah! I should probably include a count of requests per hour
[19:16:46] <joal>	 bearloga: commenting on the ticket, give me aminue :)
[19:21:14] <bearloga>	 joal: thank you very much!!!
[19:21:53] <joal>	 bearloga: I'll add other comments in the ticket as well for the archives :) But prefer to write them to you in here first :)
[19:22:35] <joal>	 bearloga: I'm looking at the data in the table, and 1h of data is ~10Mb
[19:22:50] <joal>	 bearloga: There is no point to store that by the hour
[19:23:02] <joal>	 bearloga: Let's partition by day :)
[19:27:32] <bearloga>	 joal: weird question and I suspect the answer is no but do you think it's possible to…dynamically?…partition the data? like, the query extracts a lyear/lmonth/lday which are localized versions via timezone. could we store data into partitions based on those?
[19:27:45] <joal>	 bearloga: feasible
[19:28:18] <joal>	 bearloga: but not super easy, because currently-existing partitions overlap newly created ones
[19:28:35] <joal>	 bearloga: Doing so would mean extracting most of the data ot once
[19:28:40] <joal>	 bearloga: Feasible though
[19:29:53] <joal>	 bearloga: I'm currently waiting for my test of the modified code to finish, and will send the comment
[19:31:46] <bearloga>	 joal: I see. if I'm thinking about it correctly, each query run (if operating on an hour of data) would be sending data to & resizing #{timezones}+1 partitions (the +1 is for "Unknown") which would be hard to deal with
[19:32:21] <joal>	 bearloga: That's the idea
[19:33:06] <joal>	 bearloga: a small amount of data would be sent to TS+X / TS-X, making data not so efficient to wirk with
[19:34:35] <wikibugs>	 (03PS6) 10Ottomata: Add dataframe conversion to new schema function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241
[19:34:44] <wikibugs>	 (03CR) 10Ottomata: Add dataframe conversion to new schema function (031 comment) [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241 (owner: 10Ottomata)
[19:34:56] <joal>	 bearloga: However if we work a full month (or 2) at once, we get correct partitions
[19:36:25] <bearloga>	 joal: month +/- 12 hours on each end :P
[19:36:38] <ottomata>	 joal re https://gerrit.wikimedia.org/r/#/c/411090/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/jsonrefine/SparkSQLHiveExtensions.scala
[19:36:43] <ottomata>	 i tried to reuse convert function
[19:36:44] <joal>	 bearloga: indeed :)
[19:36:54] <ottomata>	 couldn't, because of the custom logic needed in the recursino
[19:37:46] <ottomata>	 it does seem like it should be possibel though...
[19:38:37] <wikibugs>	 (03PS4) 10Ottomata: Clean up the normalize function and add a new makeNullable function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090
[19:44:50] <joal>	 ottomata: https://gist.github.com/jobar/b121b0b0f5aa9fd3a7299c38b676f405
[19:45:56] <ottomata>	 wow that is simpler than i thought it would be
[19:45:58] <ottomata>	 magic man
[19:46:02] <joal>	 :D
[19:46:20] <joal>	 ottomata: Adding depth ;)
[19:46:43] <ottomata>	 into it
[19:47:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add dataframe conversion to new schema function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241 (owner: 10Ottomata)
[19:47:50] <joal>	 bearloga: Comment sent !
[19:48:04] <joal>	 bearloga: please feel free to let me know this stuff is not as expected ;)
[19:48:19] <wikibugs>	 (03CR) 10Ottomata: Refactor JsonRefine to use DataFrame converter (032 comments) [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410942 (owner: 10Ottomata)
[19:49:21] <bearloga>	 joal: this is awesome!! thank you very much!
[19:49:30] <joal>	 bearloga: you're very welcome :)
[19:55:11] <ottomata>	 joal, trying to figure it out, but struct.convert((field, _) => field.makeNullable(nullable)) isn't quite working
[19:55:28] <joal>	 ottomata: heh
[19:55:41] <ottomata>	 type mismatch
[19:56:02] <ottomata>	 OH
[19:56:03] <ottomata>	 i know
[19:56:03] <ottomata>	 sorry
[19:56:14] <joal>	 ?
[19:56:28] <ottomata>	 paste error, didn't change fn type in convert arg
[19:56:34] <joal>	 Ahhh
[19:56:36] <joal>	 ok :)
[19:57:40] <ottomata>	 i thought it was scala weirdness with multi params or something, never can figure that syntax out, so assumed it was wrong
[19:58:43] <wikibugs>	 (03PS4) 10Ottomata: Refactor JsonRefine to use DataFrame converter [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410942
[20:00:37] <ottomata>	 wow too many patches joal, i added that on the latest one
[20:00:39] <ottomata>	 oh well
[20:01:25] <ottomata>	 joal, if you have a sec to look over real quick, maybe we can merge into this branch, and then I can work from there on spark 2-ifying
[20:03:32] <joal>	 ottomata: The one above looks good at first sight
[20:04:06] <ottomata>	 well, the 3 i mean
[20:04:18] <ottomata>	 let's merge all three of these patches, its getting unwieldy reviewing 3 at once
[20:04:26] <joal>	 ottomata: agreed
[20:04:42] <joal>	 ottomata: I mean, there's been enough work put into that for it to be good enough I think :)
[20:05:08] <joal>	 There is one last nit I want to check after your merge, and it'll be as good as I can think of :)
[20:05:16] <ottomata>	 aye, especially for just remote branch merge
[20:05:19] <ottomata>	 ok
[20:15:48] <wikibugs>	 (03CR) 10Ottomata: "Whatever is causing jenkins to barf is fixed in the next patch." [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241 (owner: 10Ottomata)
[20:21:32] <ottomata>	 joal:  one last nit? :)
[20:22:26] <joal>	 ottomata: Using an accumulator instead of count to return your number-  Let's wait for spark2 and the new API
[20:23:11] <ottomata>	 ok
[20:23:18] <ottomata>	 so gimme some sweet +1s :)
[20:24:04] <wikibugs>	 (03CR) 10Joal: [C: 031] "Let's finish this thing :)" [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241 (owner: 10Ottomata)
[20:25:18] <wikibugs>	 (03CR) 10Joal: [C: 031] "Finishing - Step 2" [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410942 (owner: 10Ottomata)
[20:25:59] <wikibugs>	 (03CR) 10Joal: [C: 031] "Last one I think" [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090 (owner: 10Ottomata)
[20:26:21] <joal>	 ottomata: I think I have not seen the last thing with the depth
[20:33:55] <ottomata>	 ?
[20:34:04] <ottomata>	 joal:  last thing with depth?
[20:34:07] <ottomata>	 oh
[20:34:38] <ottomata>	 ...
[20:34:55] <ottomata>	 where did it go...
[20:35:49] <ottomata>	 err i don't know
[20:35:51] <ottomata>	 ok...
[20:35:58] <joal>	 ottomata: sorry :(
[20:36:19] <ottomata>	 will repatch
[20:36:26] <ottomata>	 dunno what happened
[20:38:31] <wikibugs>	 (03PS5) 10Ottomata: Clean up the normalize function and add a new makeNullable function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090
[20:38:34] <ottomata>	 ok added it joal ^
[20:39:01] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Add dataframe conversion to new schema function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410241 (owner: 10Ottomata)
[20:39:09] <joal>	 Thanks ottomata :)
[20:39:50] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Clean up the normalize function and add a new makeNullable function (031 comment) [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090 (owner: 10Ottomata)
[20:39:54] <wikibugs>	 (03PS6) 10Ottomata: Clean up the normalize function and add a new makeNullable function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090
[20:39:56] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Clean up the normalize function and add a new makeNullable function [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/411090 (owner: 10Ottomata)
[20:40:08] <joal>	 ottomata: I'll care getting that into my spark-2 patch and test
[20:40:12] <joal>	 tomorrow
[20:40:46] <ottomata>	 oh!
[20:40:47] <ottomata>	 ok
[20:44:03] <joal>	 ottomata: still one patch to merge, right>?
[20:44:17] <ottomata>	 yeah, conflicts...
[20:44:20] <joal>	 Mwarf
[20:44:30] <joal>	 Ok - gone for tonight - Will test that tomorrow
[20:45:14] <mforns>	 byeeee
[20:48:38] <ottomata>	 laters!
[20:53:28] <wikibugs>	 (03PS5) 10Ottomata: Refactor JsonRefine to use DataFrame converter [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410942
[20:54:16] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Refactor JsonRefine to use DataFrame converter [analytics/refinery/source] (jsonrefine) - 10https://gerrit.wikimedia.org/r/410942 (owner: 10Ottomata)
[20:59:27] <wikibugs>	 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Stream: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#3987360 (10Smalyshev) I need it only on jumbo I think, that's where I'll be connecting.
[22:34:30] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3987592 (10Ottomata) Yargh, @elukey.   kafkatee.  Gonna be weird on oxygen, since we can't make kafkatee consume from multiple kafka clus...
[23:02:07] <wikibugs>	 10Analytics, 10TCB-Team, 10Two-Column-Edit-Conflict-Merge, 10WMDE-Analytics-Engineering, and 5 others: How often are new editors involved in edit conflicts - https://phabricator.wikimedia.org/T182008#3987638 (10GoranSMilovanovic) - @addshore @Lea_WMDE @Tobi_WMDE_SW This will take some time.   - **Why this...
[23:18:26] <bearloga>	 ottomata[m]: I'm cleaning up my home dir on stat1005 and for some reason I don't have permission to do anything with /home/bearloga/anaconda3, would you be able to delete that for me? please and thank you!