[00:23:00] <groceryheist>	 I'm trying to use python multiprocessing in SWAP
[00:23:32] <groceryheist>	 but I get OSError: [Errno 30] Read-only file system
[02:27:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: How to remove outdated and not used repo? - https://phabricator.wikimedia.org/T207204 (10Milimetric) p:05Triage>03Normal a:03Milimetric
[02:28:20] <wikibugs>	 10Analytics, 10Analytics-Kanban: How to remove outdated and not used repo? - https://phabricator.wikimedia.org/T207204 (10Milimetric) I've got permissions to do whatever we need with it.  I've archived it for now.  I can delete unless anyone else objects.
[02:30:36] <wikibugs>	 10Analytics, 10Analytics-Kanban: How to remove outdated and not used repo? - https://phabricator.wikimedia.org/T207204 (10Legoktm) If the repository is obsolete, it should go through the #cleanup process.
[03:30:52] <wikibugs>	 (03PS1) 10Milimetric: Clean up and order table chart properly [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/467868 (https://phabricator.wikimedia.org/T199693)
[05:49:46] <elukey>	 groceryheist: o/
[05:49:49] <elukey>	 on what notebook?
[05:50:05] <elukey>	 I suspect 1003 right? I think that the srv partition was full
[05:50:25] <elukey>	 for the moment can you use 1004?
[05:50:29] <elukey>	 we are working to clean i up
[05:50:31] <elukey>	 *it
[05:55:37] <groceryheist>	 elukey: I'm using 1004
[05:56:55] <elukey>	 groceryheist: so I am not seeing any issue atm on the host, what file leads to that read only error?
[05:58:56] <groceryheist>	 it happens when I try to create a process pool
[05:59:09] <groceryheist>	 import multiprocessing as mp
[05:59:18] <groceryheist>	 p = mp.Pool(2)
[05:59:29] <groceryheist>	 in python
[06:03:36] <elukey>	 groceryheist: strange, the error seems to be related to writing to a read-only file system, are you sure?
[06:03:44] <elukey>	 I just tried those and they don't lead to failures
[06:04:22] <groceryheist>	 huh let me try again
[06:04:35] <groceryheist>	 i haven't tried since I first reported
[06:04:59] <groceryheist>	 still happening
[06:06:10] <elukey>	 groceryheist: so I tried with - python3 -> import etc.. -> p = mp.Pool(2)
[06:06:16] <groceryheist>	 when I Googled the error before it seemed related to venv stuff
[06:06:18] <elukey>	 is it the same thing that you do?
[06:06:23] <groceryheist>	 did I mess up my venv
[06:06:26] <groceryheist>	 ?
[06:06:27] <elukey>	 ah there you go, you are using a venv
[06:06:55] <groceryheist>	 yeah that's how SWAP was setup
[06:14:22] <groceryheist>	 I don't know how to fix it
[06:16:49] <elukey>	 just to understand - this was working before, but now after some mess up it doesn't right?
[06:18:29] <groceryheist>	 nope
[06:18:36] <groceryheist>	 this was the first time I tried
[06:18:39] <elukey>	 ahh okok
[06:18:54] <groceryheist>	 I wonder if it has to do with how the venv was set up
[06:19:20] <elukey>	 I'd quickly create a venv and retry again
[06:19:27] <elukey>	 on notebook1004
[06:19:29] <elukey>	 just to compare
[06:33:07] <groceryheist>	 same
[06:33:18] <groceryheist>	 I just did python3 -m venv test
[06:33:23] <groceryheist>	 source test/bin/activate
[06:33:34] <groceryheist>	 and then tried creating the Pool
[06:41:57] <elukey>	 I'll do some tests later on, I am wondering if the Pool's creation need to access some part of the file system that in the venv is read only
[06:45:12] <groceryheist>	 yeah that makes sense
[06:56:01] <joal>	 elukey: morning! Quick note before I get back to kids - Looks like refine for eventlogging has either stopped or something yesterday at hour 21 CET --> oozie virtual_pageview job is stuck
[06:56:42] <joal>	 elukey: for me it can be either a refinement problem, or a raw-data-checking-and-flagging problem (camus checker)
[07:06:06] <joal>	 Ok elukey - I think I have a lead -> Data flows in kafka (https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&from=now-7d&to=now&var-schema=VirtualPageView), but is missing in HDFS not only for VirtualPageview(hdfs dfs -ls /wmf/data/raw/eventlogging/eventlogging_NavigationTiming/hourly/2018/10)
[07:06:53] <joal>	 Looked at camus logs --> It is stuck since yesterday in 2018-10-16T21:05:03 WARNING Not submitting camus job "camus-eventlogging", it is currently running.
[07:07:55] <elukey>	 joal: morning! I am debugging my new camus eventlogging job now (is failing for json exceptions..) so I was about to check that later on, there might be a stuck camus-eventlogging job
[07:07:58] <joal>	 Yesterday at ~7pm CEST the cluster got deployed, including https://gerrit.wikimedia.org/r/c/analytics/refinery/+/465471
[07:08:51] <joal>	 elukey: https://yarn.wikimedia.org/proxy/application_1539594093291_5977/mapreduce/job/job_1539594093291_5977
[07:09:55] <elukey>	 yes I know, but it is not called "camus-eventlogging"
[07:10:08] <joal>	 elukey: However that job has been running for 1h, not half a day
[07:10:31] <elukey>	 yeah it is the first run that keeps failing, the json message decoder is grumpy
[07:10:38] <elukey>	 I was about to send a patch
[07:11:07] <elukey>	 but maybeee the is_yarn_blabla looks at the start of the string?
[07:11:08] <joal>	 elukey: see refinery/python/util.py:is_yarn_application_running --> it uses grep - So it would catch camus-eventlogging-client-side
[07:11:15] <elukey>	 there you go
[07:11:16] <elukey>	 :)
[07:11:20] <elukey>	 same though as I had
[07:11:21] <elukey>	 sigh
[07:11:40] <joal>	 elukey: has one of your job been running all night?
[07:12:19] <elukey>	 I don't think so, but it runs every hour and since it is the fist one it takes a lot, and then it fails.. so I guess that it is clobbering the eventlogging refinement
[07:12:28] <elukey>	 I am going to kill the job, disable camus for the moment via cron
[07:13:07] <joal>	 elukey: ok :)
[07:13:25] <joal>	 elukey: keep testing manually instead of having an automated cron maybe ;)
[07:14:17] <elukey>	 joal: well yesterday I left with a map reduce job running and I though it was fine, then I saw the failures and it didn't seem a big trouble since the name of the yarn apps were different
[07:14:25] <elukey>	 (since it was also late in the evening)
[07:14:40] <elukey>	 grepping blindly for a name is not the best thing in the world
[07:14:45] <elukey>	 that thing needs to be fixed
[07:16:32] <elukey>	 anyhow, will try to fix this
[07:17:54] <joal>	 elukey: I agree with the grapping not being best, I also think non-tested automated-jobs are not great :)
[07:19:24] <elukey>	 joal: sure but it may happen to think that something would work and it doesn't, I am going to fix it as said earlier
[07:19:58] <elukey>	 having a job that may fail (even automated) should not be a big deal, and should not influence others
[07:20:01] <elukey>	 that's my reasoning
[07:20:17] <elukey>	 but as lesson learned, I'll never do it again :)
[07:23:05] <elukey>	 I am running the cron job manually since the next run is a *:05, and this hour's one was skipped
[07:23:16] <elukey>	 so refinement should be unblocked soon
[08:46:34] <wikibugs>	 (03PS1) 10Elukey: Add the config option 'camus.message.json.setlenient' [analytics/camus] - 10https://gerrit.wikimedia.org/r/467909 (https://phabricator.wikimedia.org/T206542)
[08:46:53] <elukey>	 Luca that writes java --> brace yourself, winter is coming
[08:51:17] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: "Anonymous Editor" is a broken link - https://phabricator.wikimedia.org/T206968 (10Zoranzoki21)
[09:20:46] <wikibugs>	 (03PS1) 10Elukey: util.py: improve is_yarn_application_running pattern match [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467926 (https://phabricator.wikimedia.org/T206542)
[09:22:17] <wikibugs>	 (03PS2) 10Elukey: util.py: improve is_yarn_application_running pattern match [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467926 (https://phabricator.wikimedia.org/T206542)
[09:45:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) I was about to send another code change for the GC, but then I took a look again to the logs in the description and realized that I've missed a...
[10:22:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) As reference, this is what https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/JvmPauseMonitor.html does:  > Class which sets up a...
[10:26:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) It happened also on the 16th, but didn't lead to any failover:  ``` 2018-10-16 22:24:26,205 INFO org.apache.hadoop.yarn.server.resourcemanager....
[10:29:42] * elukey lunch!
[11:29:36] <elukey>	 so this is really interesting
[11:29:56] <elukey>	 it seems that there is something else that forces the Yarn RM jvm to pause on an-master1001
[11:29:59] <elukey>	 that it is not the GC
[11:30:45] <elukey>	 in https://www.slideshare.net/HadoopSummit/operating-and-supporting-apache-hbase-best-practices-and-improvements (slide 15) there is a good example
[11:31:36] <elukey>	 and in that case it was a linux driver for the disk controller
[11:32:33] <joal>	 elukey: interesting elukey!
[11:37:17] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) As reference, https://www.slideshare.net/HadoopSummit/operating-and-supporting-apache-hbase-best-practices-and-improvements (slide 15) shows a...
[11:40:26] <elukey>	 joal: and it would kinda makes sense since we have been seeing this issue only after swapping the masters
[11:40:44] <elukey>	 but there is no trace in the logs/graphs/etc.. about anything hanging
[11:40:51] <elukey>	 except that log for hadoop
[11:41:09] <elukey>	 (the thread is incredibly simple and effective)
[11:41:24] <elukey>	 the main issue for us is that it can trip zk timeouts
[11:41:40] <elukey>	 so a band aid for the moment could be to simply increase the timeout
[12:36:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey)
[13:15:42] <elukey>	 going out for a run since during these evenings it seems that it always rains (now that I've said it I'll come back super wet for sure)
[13:15:48] <elukey>	 will be back in ~1h
[13:15:50] * elukey afk!
[13:39:38] <addshore>	 hi aa team :D
[13:39:47] <addshore>	 I'm curious, you work on a kanban style board right?
[13:39:56] <addshore>	 How do you / do you track your velocity?
[14:25:40] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "Huh, interesting!  This isn't the way I'd do it (I'd probably just make a StringMessageDecoder class), buuuut, I really don't mind here si" [analytics/camus] - 10https://gerrit.wikimedia.org/r/467909 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[14:27:56] <wikibugs>	 (03CR) 10Ottomata: [C: 031] util.py: improve is_yarn_application_running pattern match [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467926 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[14:27:59] <wikibugs>	 (03CR) 10Elukey: "Yes a cleaner way could be a StringMessageDecoder class, but looking into the JsonStringMessageDecoder's comments I got the impression tha" [analytics/camus] - 10https://gerrit.wikimedia.org/r/467909 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[14:28:48] <elukey>	 thanks for the reviews ottomata :)
[14:29:10] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "Naw its fine proceed!" [analytics/camus] - 10https://gerrit.wikimedia.org/r/467909 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[14:31:01] <ottomata>	 yw!
[14:31:03] <ottomata>	 your turn elukey you gotta review my stuff!
[14:31:12] <ottomata>	 OH
[14:31:18] <ottomata>	 some maybe you weren't added as reviewr on...
[14:31:55] <ottomata>	 ok just one
[14:32:31] <ottomata>	 i got 4 reviews there for ya, all hadoop puppet stuff
[15:01:52] <ottomata>	 luca yeah, sucks about the proemthues label
[15:01:52] <ottomata>	 i think we might have to just accept a break in the graphs
[15:01:52] <ottomata>	 not really that big of a deal i think, time marches on
[15:01:52] <ottomata>	 graphs will fill back up after long enough :)
[16:56:02] <joal>	 future has just leapt closer .... https://twitter.com/gavinsblog/status/1052140235033862144
[16:56:02] <wikibugs>	 (03PS1) 10Elukey: Add StringMessageDecoder to the list of kafka coders [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/468016 (https://phabricator.wikimedia.org/T206542)
[16:56:05] <wikibugs>	 (03CR) 10Ottomata: Add StringMessageDecoder to the list of kafka coders (031 comment) [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/468016 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[16:56:06] <wikibugs>	 (03CR) 10Elukey: Add StringMessageDecoder to the list of kafka coders (031 comment) [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/468016 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[16:56:09] <elukey>	 ottomata: in theory we could simply transform the byte array to string, and return it
[16:56:09] <elukey>	 decoding it with UTF-8 seemed sane in general
[16:56:09] <elukey>	 (or applying some sort of encoding)
[16:56:10] <wikibugs>	 (03CR) 10Milimetric: [C: 032] Fix important vulnerabilities on Wikistats 2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/467730 (https://phabricator.wikimedia.org/T206474) (owner: 10Fdans)
[16:56:10] <wikibugs>	 (03PS1) 10Fdans: Sets the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[16:58:47] <ottomata>	 elukey:  i jut wonder if we need to decode at all
[16:58:53] <ottomata>	 I think message.payload is just a byte array
[16:59:08] <ottomata>	 message.GetPayload
[16:59:18] <ottomata>	 so
[16:59:45] <ottomata>	 return new CamusWrapper<byte[]>(message.getPayload(), System.currentTimeMillis());
[16:59:46] <ottomata>	 ?
[16:59:48] <ottomata>	 mabye?
[17:00:21] <elukey>	 I would still think that a regular String, properly encoded, is way cleaner 
[17:00:58] <elukey>	 (and we also do it elsewhere and it works :P)
[17:01:25] <ottomata>	 right, but ultimately when it is written back by camus it is converted back to bytes ?  
[17:01:51] <ottomata>	 just seems like if we never use the string, why decode it into one?
[17:01:57] <ottomata>	 again elukey i am not picky here
[17:02:02] <ottomata>	 whatever works is fine with me
[17:03:35] <elukey>	 ok I see, the only thing that I may wonder now is if we want to know if a string can be decoded into utf-8 (or just as string) while importing data
[17:03:55] <elukey>	 like if there is a weird use case that we would like to know
[17:03:57] <elukey>	 but probably not
[17:05:56] <elukey>	 ottomata: where is the point in which it is converted back to bytes? (Just to understand the code)
[17:08:14] <ottomata>	 AHH elukey!  
[17:08:15] <ottomata>	 ok
[17:09:06] <ottomata>	 StringRecordWriterProver
[17:09:13] <ottomata>	 which does indeed need a CamusWrapper<String>
[17:09:41] <ottomata>	 so, we could make a new RecordWriter, but mehhhhhh
[17:09:41] <ottomata>	 let's just go with string
[17:09:41] <ottomata>	 this is too much
[17:09:41] <ottomata>	 String is fine
[17:19:35] <elukey>	 all righhtttt
[17:19:56] <wikibugs>	 (03PS2) 10Elukey: Add StringMessageDecoder to the list of kafka coders [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/468016 (https://phabricator.wikimedia.org/T206542)
[17:20:02] <elukey>	 mind to check the last code change? --^
[17:27:07] <wikibugs>	 (03CR) 10Ottomata: [C: 031] Add StringMessageDecoder to the list of kafka coders [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/468016 (https://phabricator.wikimedia.org/T206542) (owner: 10Elukey)
[17:39:41] <elukey>	 The hdfs balancer is now a timer!
[17:40:26] * elukey off!
[17:42:51] <wikibugs>	 (03CR) 10Milimetric: Sets the active filter correctly on breakdowns mount (034 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[17:44:23] <ottomata>	 nice
[17:45:55] <wikibugs>	 (03PS2) 10Fdans: Sets the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[17:46:01] <wikibugs>	 (03CR) 10Fdans: Sets the active filter correctly on breakdowns mount (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822) (owner: 10Fdans)
[17:46:33] <wikibugs>	 (03PS3) 10Fdans: Set the active filter correctly on breakdowns mount [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468027 (https://phabricator.wikimedia.org/T206822)
[17:47:14] <wikibugs>	 10Analytics: Serve legacy code only to legacy browsers - https://phabricator.wikimedia.org/T207311 (10Milimetric)
[17:49:57] <wikibugs>	 10Analytics: Serve legacy code only to legacy browsers - https://phabricator.wikimedia.org/T207311 (10Milimetric) Hm, the only problem is detecting the UA before fetching the correct bundle.  We could do that client side but I bet it'll FOUC all nasty.  Maybe we could do it in Apache?!!
[17:50:34] <milimetric>	 biking home, be back in a bit
[17:59:26] <wikibugs>	 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10JTannerWMF)
[18:58:03] <wikibugs>	 (03PS5) 10Joal: Update DataFrameToHive for dynamic partitions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465202 (https://phabricator.wikimedia.org/T164020)
[18:58:05] <wikibugs>	 (03PS3) 10Joal: Add webrequest_subset_tags transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020)
[18:58:20] <joal>	 Hey ottomata - I added doc to the transform-function --^
[18:58:31] <joal>	 For when ou have time and will to read :) Thanks
[19:02:57] <ottomata>	 NICE joal
[19:02:58] <ottomata>	 much clearer
[19:03:08] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "Let's do this thang" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[19:09:05] <wikibugs>	 (03CR) 10Joal: Add webrequest_subset_tags transform function (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[19:09:16] <wikibugs>	 (03PS4) 10Joal: Add webrequest_subset_tags transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020)
[19:28:49] <wikibugs>	 (03CR) 10Nuria: "I am not clear how would this be used in practice (seems that it could only be used through spark not hive) can we go over this code on gr" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[19:28:56] <nuria>	 joal: yt?
[19:29:00] <joal>	 yes
[19:29:22] <ottomata>	 nuria:  that is a Refine transform function
[19:29:34] <ottomata>	 it is applied on a spark dataframe
[19:30:04] <ottomata>	 so joal can point a Refine job at webrequest, configure it to use this funciton, and he'll get the tagged webrequest_subset table
[19:30:30] <nuria>	 ottomata: right, but wasn't the purpose of the  tags idea to split webrequest to reduce the datasets for pointed queries (say look only at  the wdqs records) ? 
[19:30:33] <nuria>	 cc joal 
[19:30:58] <joal>	 nuria: batcave, for conciseness?
[19:31:02] <nuria>	 sure
[19:31:08] <joal>	 ottomata: --^?
[19:31:25] <ottomata>	 i'll skip you can explain!  
[19:31:37] <ottomata>	 unless you need me!
[19:31:46] <ottomata>	 nuria:  that is what that transform does
[19:31:58] <ottomata>	 it gives you a new dataframe with a new partition, the partition being the webrequest_subset tags
[19:32:02] <ottomata>	 like wdqs
[19:32:06] <ottomata>	 so you can do
[19:32:28] <ottomata>	 select from webrequset_subset where subset='wdqs'
[19:32:28] <ottomata>	 (right joal?)
[19:38:17] <wikibugs>	 10Analytics, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) p:05Triage>03High
[19:38:28] <wikibugs>	 10Analytics, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata)
[19:49:14] <nuria>	 ottomata: ok, talked to joal
[19:49:39] <joal>	 ottomata: sorry couldn't respond, I was talking :)
[19:49:40] <ottomata>	 :)
[19:49:56] <joal>	 ottomata: I think we're settled, with some more doc to come :)
[19:50:05] <ottomata>	 coo
[19:50:16] <ottomata>	 joal:  hm, shoudl you sort the tags you get for each subset?
[19:50:33] <joal>	 ottomata: for the subset function?
[19:50:52] <joal>	 or to faciliate usage?
[19:50:54] <nuria>	 ottomata: i think  we just need to make docs a bit clearer about the use case this solve, the:  select from webrequset_subset where subset='wdqs' is solved by a  function that  just supports simple "and"  on the set that tags are,  i take we can also support "or" for free  but that is distracting as our uses cases do not really have a need  for that 
[19:51:12] <ottomata>	 OH, right i gues sit doesn't matter sorry
[19:51:18] <ottomata>	 its the 'name' that is in the partiton value
[19:51:19] <ottomata>	 not the tags
[19:51:19] <ottomata>	 nm
[19:51:45] <ottomata>	 the name is unique, i was worried about something like contentview,pageview != pageview,contentview
[19:51:50] <joal>	 correct ottomata - And the isSubset function uses Set, no ordering isswue)
[19:51:54] <ottomata>	 k righto
[19:52:45] <ottomata>	 hoping that wmf.werequest_subset_tags will be an uncompressed hive text table?
[19:52:52] <ottomata>	 csv or whatever?
[19:53:15] <wikibugs>	 10Quarry, 10Documentation: Example queries for Quarry - https://phabricator.wikimedia.org/T207098 (10Aklapper)
[19:53:43] <joal>	 ottomata: That's the plan as of now
[19:53:47] <wikibugs>	 10Analytics: API endpoint for mediacounts - https://phabricator.wikimedia.org/T207208 (10Milimetric) p:05Triage>03Normal
[19:53:58] <joal>	 ottomata: refinery/static_data
[19:53:58] <ottomata>	 r8 :)
[19:53:59] <ottomata>	 gr8
[19:55:48] <joal>	 Ok I'll tr to have that finalized tomorrow (doc, job and oozie :)
[20:33:19] <wikibugs>	 10Analytics, 10Cloud-Services, 10Pageviews-API, 10wikitech.wikimedia.org: wikitech.wikimedia.org missing from pageviews API - https://phabricator.wikimedia.org/T153821 (10Milimetric) the two steps needed to add this to the pageview definition are:  1. What nuria mentioned, adding it to the whitelist (https...
[20:58:05] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cleanup: Delete wikimedia/kraken repository from github - https://phabricator.wikimedia.org/T207204 (10hashar) Archiving a git repository turns out to require several step.  For MediaWiki skins / extensions there is a long checklist on https://phabricator.wikimedia.org/proj...
[21:24:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cleanup: Delete wikimedia/kraken repository from github - https://phabricator.wikimedia.org/T207204 (10Nuria) If I understood well this is now on your capable hands, right @hashar ?
[22:02:41] <Krinkle>	 I'm having trouble with a Hive query. Not quite sure how to proceed, as I've run similar queries in the past that do seem to work: https://phabricator.wikimedia.org/P7688
[22:10:00] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 3 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10Pchelolo) I know what's happening. If there's more items in the list then the bach size, the code re-enqueues exactly the sam...
[22:21:12] <nuria>	 Krinkle: hola!
[22:27:00] <Krinkle>	 I used Turnilo instead, and got what I needed. Still curious why Hive didn't work for me though.
[22:27:08] <Krinkle>	 Do you see somethign wrong in the query?
[22:27:46] <Krinkle>	 nuria: I need to use source, it is significant for the problem I am investigating.
[22:27:54] <Krinkle>	 (saw your comment, thanks!)
[22:29:31] <nuria>	 Krinkle: ah, if you think you need source it should be "webrequest_source=text"
[22:29:39] <nuria>	 Krinkle: let me verify it runs
[22:29:50] <Krinkle>	 thx
[22:30:23] <Krinkle>	 "webrequest_source	string	Source Varnish cluster (partition field), values are the part after 'cache_' in the Varnish cluster names."
[22:30:29] <Krinkle>	 I misread this description. 
[22:30:29] <Krinkle>	 :)
[22:32:15] <nuria>	 Krinkle: other way to tell is executing 
[22:32:18] <nuria>	 hdfs dfs -ls /wmf/data//wmf/webrequest/
[22:32:31] <nuria>	 on command line, not on hive, and you will see how things are set up
[22:32:37] <nuria>	 that acesses hdfs directly
[22:33:20] <Krinkle>	 Interesting. 
[22:35:19] <Krinkle>	 I get a similar error when I use source="text"
[22:36:40] <Krinkle>	 Although it did get further.
[22:37:43] <nuria>	 Krinkle: are you running beeline
[22:37:46] <nuria>	 ?
[22:37:52] <Krinkle>	 re-pasted at https://phabricator.wikimedia.org/P7688
[22:38:07] <Krinkle>	 I switched back to hive for better error reporting and overall familiarity.
[22:38:34] <Krinkle>	 should I try beeline?
[22:39:04] <nuria>	 Krinkle: no, hive worked w/o issues for me, just pasted results of your query
[22:39:04] <Krinkle>	 Interesting, works for you.
[22:39:07] <Krinkle>	 I'm using wmf_raw
[22:39:12] <nuria>	 Krinkle: ah
[22:39:18] <nuria>	 Krinkle: see  https://phabricator.wikimedia.org/P7688
[22:39:21] <Krinkle>	 I see yours is (wmf)
[22:39:29] <Krinkle>	 Yeah
[22:39:34] <nuria>	 Krinkle:  right , which is processed data
[22:39:47] <nuria>	 Krinkle: wmf_raw requires a serde (a decoder)
[22:40:01] <Krinkle>	 Yeah, I added the decoder, because initially it failed with SerDe
[22:40:36] <nuria>	 Krinkle: also wmf has other processed staff you  might find of use like geolocation and such, just wmf data is 2 hours behind from wmf_raw (more or less)
[22:41:03] <Krinkle>	 Yeah, I assumed for things that don't need processed, it would be good enough to use wmf_raw.
[22:41:08] <Krinkle>	 It is supposed to work right?
[22:41:18] <Krinkle>	 but I'll use the processed data for now
[23:03:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cleanup, 10GitHub-Mirrors: Delete wikimedia/kraken repository from github - https://phabricator.wikimedia.org/T207204 (10greg) (It should still be in this project)
[23:11:15] <nuria>	 Krinkle: it is supposed to work yes but it will be real slow as its data is not compressed as efficiently as wmf which is mean mostly for uqerying
[23:11:35] <nuria>	 *querying
[23:22:23] <Krinkle>	 nuria: okay, I wasn't sure how it worked internally, but that makes sense. I'll avoid the raw one from now on. Thanks!
[23:51:06] <wikibugs>	 10Analytics, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10bd808) Most of the cloud infrastructure hosts either are in the public vlan or are moving there as we update and replace hardware. The labsdb10...