[00:21:48] <wikibugs_>	 06Analytics-Kanban, 06Performance-Team, 13Patch-For-Review: Update webperf EventLogging consumers for userAgent schema change - https://phabricator.wikimedia.org/T156760#3082525 (10Krinkle) a:05Krinkle>03Nuria
[00:32:34] <Krinkle>	 ottomata: nuria: I'm trying to understand the current eventlogging consumer in context of https://phabricator.wikimedia.org/T131977
[00:32:50] <Krinkle>	 it seems both navtiming and ve use the same --endpoint, but one zmq and one eventlogging.connect(). 
[00:33:16] <Krinkle>	 I assumed the latter would be kafka since that is recommended, but I can't quite tell from looking through https://github.com/wikimedia/eventlogging what connect() actually does.
[00:33:37] <Krinkle>	 EventConsumer doesn't seem to actually connect to anything directly. Not sure..
[00:36:16] <wikibugs_>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 4 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#3082571 (10mobrovac) 05Open>03Resolved Deployed, resolving
[00:43:04] * Krinkle replies on ticket :)
[00:43:52] <wikibugs_>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 4 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#3082619 (10Jdlrobson) Note, until the period parameter is available this is likely to render only things e...
[00:44:31] <wikibugs_>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 4 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#3082622 (10Pchelolo) Yup, making a PR as we speak.
[00:49:26] <wikibugs_>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 4 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#3082652 (10Jdlrobson) If all works correctly I'd expect to see `2017 BNP Paribas Open – Men's Singles` in...
[00:52:29] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team: Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#3082671 (10Krinkle) >>! In T131977, @Ottomata wrote: > Currently hafnium runs /srv/webperf/ve.py which includes eventl...
[00:58:31] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3082684 (10Krinkle)
[00:59:30] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1589603 (10Krinkle) a:05ori>03Krinkle
[01:45:43] <wikibugs_>	 06Analytics-Kanban, 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog: Android development event logging broken - https://phabricator.wikimedia.org/T159845#3082890 (10Nuria)
[01:47:15] <wikibugs_>	 06Analytics-Kanban, 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog: Android development event logging broken - https://phabricator.wikimedia.org/T159845#3080590 (10Nuria) 05Resolved>03Open
[04:38:12] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team, 13Patch-For-Review: Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#2184882 (10Nuria) >The puppet config for ve.py (uses global eventlogging) and navtiming.py (uses...
[08:36:43] <joal>	 Hi a-team
[08:37:02] <joal>	 I realized I made a mistake while merging patches for aqs yesterday
[08:37:30] <joal>	 One patch got merged that was 1- not needed yet, 2- not ready yet
[08:38:11] <joal>	 As long as we don't restart cassandra oozie job, there'll be no impact, so I suggest not rollbacking, but I'd like to have team opinion
[08:39:56] <elukey>	 hello!
[08:40:07] <elukey>	 what happens if we need to restart the cassandra oozie job?
[08:40:32] <elukey>	 I can see myself in the morning checking the jobs, seeing an oozia alarm and hitting "Restart" without thinking about it :D
[08:56:22] <wikibugs_>	 (03PS10) 10Joal: [WIP] Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (owner: 10Milimetric)
[08:56:49] <joal>	 Hi elukey :)
[08:57:00] <joal>	 elukey: Re-running will not trigger any error
[08:58:23] <joal>	 elukey: code has not been deplpoyed (in refinery), and re-run (even if deployed) involves versionned hdfs refinery folder - So the only problem that could happen is: on a newly deployed refinery, kill cassandra bundle and restart it
[08:58:53] <joal>	 elukey: makes sense?
[09:00:39] <elukey>	 joal: ahhhh okok now I feel ok
[09:00:45] <elukey>	 +1 to keep it
[09:01:02] <elukey>	 (thanks for the explanation)
[09:03:37] <joal>	 np elukey :)
[09:15:25] <elukey>	 now I just realized that yesterday I forgot to check an important thing
[09:15:44] <elukey>	 namely if partman does not touch disks during partitioning that are not listed
[09:15:48] <elukey>	 namely using keep
[09:15:50] <elukey>	 I believe it does
[09:15:58] <elukey>	 so this would allow us to save data
[09:17:21] <elukey>	 now that I think about it, yesterday on disk had like ~2TB of disk used and I thought it was weird.. because I was following the install guide, namely re-creating partitions
[09:17:35] <elukey>	 but I am pretty sure those were ok
[09:17:36] <elukey>	 sigh
[09:17:49] <elukey>	 I am tempted to reimage another noe
[09:17:51] <elukey>	 *node
[09:20:35] <elukey>	 joal: anything against me re-imaging an1041/
[09:20:36] <elukey>	 ?
[09:22:10] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3083379 (10elukey)
[09:27:07] <wikibugs_>	 (03PS3) 10Joal: [WIP] Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030
[09:27:21] <joal>	 elukey: nothing against, please do :)
[09:27:37] <joal>	 elukey: I have not monitored an1040 - everything good I assume
[09:27:52] <elukey>	 joal: yep metrics are fine, and I haven't see issues in the logs for the moment
[09:29:56] <joal>	 elukey: just checked quickly, llok good :)
[09:30:25] <joal>	 elukey: just checked as well hadoop board - Have we done something to namenode yesterday?
[09:30:52] <joal>	 elukey: Ah - Actually, normal regular pattern, old GC
[09:33:59] <joal>	 elukey: When looking at 30 days in hadoop mtrics, the only one that really changes are nodemanager ones - Looks like bug is indeed corrected :D
[09:34:22] * joal claps for not having to do regular restarts except for java upgrade :)
[09:35:14] <joal>	 a-team - need to go for doctor appointment - Should be back beginning afternoon
[09:42:10] <elukey>	 o/
[09:58:50] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1041 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[10:01:00] <elukey>	 argh forgot to silence
[10:02:29] <erasmus>	 SO is this the closest channel to google analytics on freenode?
[10:07:08] <elukey>	 erasmus: Hi, what do you mean? 
[10:07:25] <erasmus>	 I mean there's no google analytics channel.
[10:07:43] <erasmus>	 this is all I came up with when searching with alis.
[10:08:43] <elukey>	 erasmus: This is the Wikimedia Analytics Team channel, so probably far from what you are looking for :)
[10:08:54] <erasmus>	 I know :)
[10:12:31] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3083512 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['ana...
[10:15:30] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Audit fstabs on Kafka and Hadoop nodes to use UUIDs instead of /dev paths - https://phabricator.wikimedia.org/T147879#3083519 (10elukey) Fixed some little mistakes and added documentation to https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/A...
[10:35:36] <wikibugs_>	 06Analytics-Kanban, 06Operations, 10netops, 15User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3083678 (10elukey)
[10:38:18] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3083687 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1041.eqiad.wmnet'] ```  and were **ALL** suc...
[11:01:26] <mforns>	 hi team :]
[11:03:18] <elukey>	 o/
[11:04:00] <mforns>	 o/
[11:06:21] <wikibugs_>	 (03PS3) 10Mforns: Add oozie workflow to load projectcounts to AQS [analytics/refinery] - 10https://gerrit.wikimedia.org/r/339421 (https://phabricator.wikimedia.org/T156388)
[11:25:10] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1041 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[11:46:47] <elukey>	 analytics1041 up and running with debian!
[11:46:54] <elukey>	 I think I have a procedure to preserve data
[11:46:59] <elukey>	 but it is a bit hacky
[11:47:09] <elukey>	 I just started the datanode on 1041
[11:47:25] <elukey>	 and it is working fine, atm deleting a lot of files (probably old/stale entries)
[11:47:28] <elukey>	 let's see how it goes
[11:49:03] <elukey>	 now I have entries like 2017-03-08 11:48:27,298 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1552854784-10.64.21.110-1405114489661:blk_1193849903_120181085 src: /10.64.53.25:49103 dest: /10.64.53.20:50010
[11:49:10] <elukey>	 :)
[12:03:46] * elukey lunch!
[13:11:29] <joal>	 finally back, with again electrical issue at home
[13:11:47] <joal>	 mforns: I have seen you have tried to rerun mediacount job
[13:11:53] <fdans>	 joal o/
[13:11:58] <joal>	 mforns: any idea what went wrong
[13:12:00] <joal>	 ?
[13:12:03] <joal>	 Hi joal
[13:12:05] <joal>	 hi fdans 
[13:12:42] <fdans>	 joal I'm running one last test query, are you ok with launching the beginning of time - march 1st job if that one goes well?
[13:13:28] <elukey>	 joal: I haven't investigated yet but it could have been me reimaging an1041
[13:14:14] <joal>	 elukey: I think mforns tried to rerun it, and it failed again (not cool)
[13:14:31] <mforns>	 joal, elukey, yes
[13:14:43] <joal>	 fdans: If you're confident with your query and data, please go :)
[13:14:53] <joal>	 fdans: have you tested loading some portins?
[13:15:02] <joal>	 hmmm
[13:16:40] <fdans>	 do you have a couple mins to go about doing that with an oozie job, when the query finishes?
[13:16:44] <fdans>	 joal ^
[13:17:02] <joal>	 fdans: have looked at existing loading jobs?
[13:17:09] <joal>	 have YOU looked sorry ?
[13:17:23] <joal>	 elukey, mforns: Tried to rerun it once
[13:17:51] <fdans>	 yes, I was with that yesterday night
[13:18:20] <fdans>	 the wikitech doc about oozie jobs, I don't know who wrote it but it's FANTASTIC
[13:18:48] <joal>	 fdans: Have you look at oozie loading jobs?
[13:19:21] <fdans>	 joal: you mean like this https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/monthly/workflow.xml
[13:24:54] <mforns>	 joal, did you rerun it from hue button?
[13:26:25] <mforns>	 looks like it's fixed
[13:26:50] <joal>	 mforns: yes, that's what I did
[13:26:54] <joal>	 bizarre
[13:27:12] <joal>	 fdans: I meqnt that indeed
[13:27:33] <mforns>	 ok, I wasn't sure we could use that button, so I run it with the usual oozie -run command, and it failed... dunno
[13:28:27] <joal>	 mforns: hmm - do you have an history of that command you used?
[13:30:13] <mforns>	 joal, http://pastebin.com/ym0eWFEk
[13:30:44] <joal>	 multiple things mforns
[13:31:00] <mforns>	 yes
[13:31:04] <fdans>	 oh this is clearing some things for me _watching_
[13:31:14] <joal>	 mforns: you should change the 2016 part of the subcommand extracting refinery âth
[13:31:32] <joal>	 We now deploy in 2017 folders :)
[13:31:39] <mforns>	 oh yes, I knew that, but forgot
[13:32:10] <joal>	 mforns: Second and more importantly, the command you launched runs a new coordinator, it doesn't RErun the existing failing job
[13:32:32] <mforns>	 of course
[13:33:22] <joal>	 rerun button in hue does that, and in CLI, oozie job -rerun JOBID
[13:33:31] <joal>	 mforns: --^
[13:33:38] <mforns>	 sorry joal, internet hiccup
[13:33:48] <joal>	 np mforns, was saying:
[13:33:50] <joal>	 rerun button in hue does that, and in CLI, oozie job -rerun JOBID
[13:33:53] <mforns>	 last I read was " We now deploy in 2017 folders :)"
[13:34:06] <mforns>	 aha!
[13:34:11] <joal>	 ah, so:  mforns: Second and more importantly, the command you launched runs a new coordinator, it doesn't RErun the existing failing job
[13:34:23] <joal>	 then the line pasted first ;°
[13:34:34] <mforns>	 yes, I expected it run a new temporaruy coordinator
[13:34:48] <joal>	 fdans: so, having found existing loading code, what are your plans?
[13:35:20] <joal>	 mforns: For failures like that we prefer to rerun them - easier to keep track of what has been succesfull
[13:35:30] <mforns>	 ok, makes sense joal thanks :]
[13:35:37] <joal>	 np mforns :)
[13:36:36] <mforns>	 btw joal, if you have 5 mins now or later, can you have a look at the msck repair table error with me, see if it rings a bell, if not I'll continue looking into it
[13:37:56] <fdans>	 joal my take would be to copy the monthly job to a new directory and alter the properties so that it loads data from my location into the new keyspace
[13:38:54] <joal>	 fdans: not bad :) There is a bit more to modify than just output keyspace (I suggest you review fields for instance), but this is the overall idea :)
[13:39:01] <joal>	 mforns: sure, let's take a minute
[13:39:31] <mforns>	 ok
[13:39:35] <mforns>	 batcave?
[13:40:52] <joal>	 sure mforns 
[13:41:22] <fdans>	 (anyone has any idea of why my ssh session freezes every time a hive job finishes?)
[13:42:06] <elukey>	 fdans: are you using tmux/screen ?
[13:42:15] <fdans>	 nope
[13:42:37] <fdans>	 although this time it seems a connection hiccup, so never mind :)
[13:42:57] <elukey>	 yeah this is why I was suggesting tmux, it is great for all sort of ssh problems
[13:54:06] <mforns>	 elukey, heyyyy, can you help me with the metastore problem?
[13:54:35] <elukey>	 sure!
[13:55:58] <mforns>	 joal, one more thing, I saw there's this "Use v2 table in Cassandra, switch to padded day timestamp" change in refinery
[13:55:59] <mforns>	 should that be deployed this week?
[13:55:59] <mforns>	 if so, I can do it today
[13:55:59] <mforns>	 elukey, I can explain a bit:
[13:57:35] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3084091 (10elukey) Summary of today. I reimaged analytics1041 with the analytics-flex.cfg partman recipe, that does not men...
[13:57:39] <mforns>	 we have this projectcounts-raw data in hdfs, that is divided in yearly folders, each folder is named like "year=2007", "year=2008", etc.
[13:57:39] <mforns>	 each folder has just 1 file (tsv, gzipped)
[13:57:39] <mforns>	 plus the _SUCCESS file
[13:57:39] <mforns>	 this structure is the same that hive uses for partitioned tables
[13:57:40] <mforns>	 now
[13:57:41] <mforns>	 I create an external table on top of that data and seems to work
[13:59:39] <mforns>	 but then I need to tell the metastore that the data is organized in yearly partitions, so I execute:
[13:59:39] <mforns>	 msck repair table projectcounts_raw
[13:59:39] <mforns>	 which is supposed to update the metastore recognizing the yearly partitions and enabling queries to the table
[13:59:39] <mforns>	 but it throws the following error:
[13:59:40] <mforns>	 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
[13:59:41] <mforns>	 the thing is...
[13:59:41] <mforns>	 I executed the same exact commands before my short vacation, and they worked
[13:59:44] <mforns>	 it looks like a metastore problem
[14:00:07] <joal>	 mforns: deploy not needed, actually it's a mistake from me to have merged that patch that early
[14:00:18] <mforns>	 ok ok np
[14:00:41] <elukey>	 mforns: do you have the whole error message somewhere?
[14:00:56] <mforns>	 elukey, that's the whole error message :]
[14:00:59] <elukey>	 sigh
[14:01:35] <mforns>	 but maybe there's more logs I can look at, I looked for them but didn't find anything
[14:02:02] <mforns>	 maybe in metastore machine? analytics1003?
[14:04:15] <elukey>	 mforns: when was the last time that you executed it?
[14:06:01] <mforns>	 elukey, a couple minutes ago
[14:06:01] <mforns>	 like 15 minutes ago
[14:06:01] <mforns>	 I can execute it now
[14:08:18] <elukey>	 can't find anything relevant in the logs :/
[14:08:52] <mforns>	 sorry elukey lost last 2 minutes (switched to other internet)
[14:09:12] <elukey>	 didn't write anything
[14:09:16] <mforns>	 k
[14:09:53] <elukey>	 mforns: is it possible to re-create the table?
[14:10:06] <mforns>	 elukey, sure
[14:10:10] <mforns>	 will do now
[14:10:13] <elukey>	 super
[14:11:17] <mforns>	 elukey, dropping...
[14:11:23] <mforns>	 done
[14:11:46] <mforns>	 elukey, creating...
[14:11:52] <mforns>	 done
[14:12:24] <mforns>	 elukey, now trying to msck repair...
[14:12:34] <mforns>	 done, with same error
[14:14:44] <joal>	 fdans: are you doing ok?
[14:17:29] <fdans>	 sorry joal I continued working on EL while the query was running, so haven't done anything yet
[14:17:38] <joal>	 fdans: no prob :)
[14:18:15] <joal>	 fdans: While I'm pushing you to read the code base and try by yourself, I'm also trying to keep an eye on you in case ;)
[14:18:42] <fdans>	 ha, I appreciate that joal 
[14:19:24] <joal>	 fdans: If you feel the balance is not good in either way, please let me know :)
[14:21:51] <elukey>	 mforns: still nothing found..
[14:22:10] <mforns>	 elukey: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_upgrade_to_cdh52_using_parcels.html#concept_r5x_wfx_jq__section_bk3_1d2_lq
[14:22:29] <mforns>	 section Upgrade the hive metastore database
[14:22:49] <elukey>	 it doesn't load
[14:22:57] <elukey>	 now it is
[14:23:12] <elukey>	 but we have 5.10 now
[14:23:22] <mforns>	 ?
[14:23:52] <mforns>	 elukey, didn't we upgrade to 5.5 last week?
[14:24:36] <elukey>	 nope 5.10
[14:24:44] <elukey>	 and from https://etherpad.wikimedia.org/p/analytics-cdh5.10 Hive haven't changed
[14:24:48] <mforns>	 oh, but it says: CDH 5.3 to 5.4 or higher
[14:24:55] <elukey>	 *hasn't changed
[14:25:09] <mforns>	 I see
[14:25:47] <elukey>	 plus I think that the guide is for cloudera enterprise
[14:25:53] <mforns>	 oh..
[14:25:53] <elukey>	 where they offer all the magic buttons
[14:25:57] <mforns>	 yes
[14:26:00] <elukey>	 :D
[14:26:09] <mforns>	 o.o
[14:26:32] <elukey>	 really weird though
[14:26:38] <elukey>	 we shouldn't have changed anything
[14:27:19] <mforns>	 did you see anything in the logs?
[14:27:47] <elukey>	 nope
[14:27:48] <mforns>	 I think I don't have permits to read them
[14:27:51] <mforns>	 k
[14:28:30] <elukey>	 mforns: I think that you can see them
[14:28:41] <mforns>	 shoud I sudo -u?
[14:28:41] <elukey>	 the other flag has read perms
[14:28:44] <elukey>	 no no
[14:29:09] <elukey>	 try less /var/log/hive/hive-server2.log
[14:29:20] <elukey>	 and /var/log/hive/hive-metastore.log
[14:29:31] <elukey>	 mforns: can you try to run again the command?
[14:29:35] <elukey>	 I am live tailing the logs
[14:29:38] <mforns>	 elukey, yes I can read
[14:29:42] <mforns>	 sure, one sec
[14:29:53] <mforns>	 done
[14:31:56] <elukey>	 grrr no logs at all
[14:32:27] <mforns>	 yep
[14:32:29] <ottomata>	 goodmorninggg
[14:32:41] <mforns>	 hello!
[14:32:50] <mforns>	 maybe nothing to do with metastore!
[14:33:08] <elukey>	 I thought it was something related to the table
[14:33:21] <elukey>	 but I expected some logs :)
[14:33:41] <mforns>	 is there a place for hive query logs?
[14:34:00] <elukey>	 an1003 should be the place
[14:35:10] <elukey>	 mforns: can I run "msck repair table projectcounts_raw" ? 
[14:35:18] <elukey>	 (I guess I need to be hdfs right?)
[14:36:00] <mforns>	 elukey, yes I'm running it as hdfs
[14:36:15] <mforns>	 sure, go ahead
[14:36:18] <mforns>	 on wmf database
[14:39:23] <elukey>	 AH! Found moar logs!
[14:39:41] <mforns>	 o//////
[14:39:55] <elukey>	 2017-03-08 14:37:24,168 INFO  ql.Driver (Driver.java:execute(1600)) - Executing command(queryId=hive_20170308143737_f0aeb879-3248-45cc-bd3a-7c0d4e9924ec): msck repair table projectcounts_raw
[14:39:59] <elukey>	 2017-03-08 14:37:24,169 INFO  ql.Driver (Driver.java:launchTask(1974)) - Starting task [Stage-0:DDL] in serial mode
[14:40:02] <elukey>	 2017-03-08 14:37:24,181 WARN  exec.DDLTask (DDLTask.java:msck(1779)) - Failed to run metacheck:
[14:40:05] <elukey>	 org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate found for Alias "org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker@203f4429" Table "projectcounts_raw"
[14:40:14] <mforns>	 cool
[14:40:18] <elukey>	 mforns: I looked for the hive query id
[14:40:32] <elukey>	 in hive-server.log
[14:40:38] <mforns>	 ok
[14:40:50] <elukey>	 does the above error mean anything to you?
[14:41:30] <mforns>	 yes, sort of, but it's weird
[14:41:36] <elukey>	 ahahahah
[14:41:47] <mforns>	 because the table HAS partition predicate
[14:42:01] <mforns>	 you can check it by executing a show create table on it
[14:42:59] <mforns>	 but at least I have something to continue digging
[14:44:06] <ottomata>	 No partition predicate found for Alias
[14:44:08] <mforns>	 it worked!
[14:44:12] <ottomata>	 is it saying that it found a directory named alias?
[14:44:17] <ottomata>	 in the partition hierarchy path?
[14:44:22] <mforns>	 it was because hive.mapred.mode=strict
[14:44:54] <mforns>	 ah... sigh.. sorry for bothering you ops with sth so non-ops
[14:45:00] <mforns>	 but the log message helped a lot
[14:45:28] <elukey>	 mforns: glad that you are unblocked :)
[14:45:31] <elukey>	 ottomata: hiiiiiiiiiiiiiiiiiiiiii
[14:45:40] <mforns>	 :]
[14:45:43] <elukey>	 I found a super hacky way to keep the partions when reimaging a node
[14:46:29] <ottomata>	 oh ya?
[14:46:34] <ottomata>	 elukey:  i just thought of something too
[14:46:41] <ottomata>	 we don't want to delete existing journalnode partiotions ! :o
[14:46:53] <ottomata>	 i guess we could copy the content over to a datanode disk before we reinstall
[14:47:05] <ottomata>	 or, move the journalnodes to jessie boxes once they are up, before we reinstall the trusty ones
[14:48:07] <elukey>	 good point, I didn't think about it!
[14:48:18] <elukey>	 ottomata: or we could grab a LVM snapshot and restore?
[14:48:41] <elukey>	 but I like the move journalnodes part
[14:48:58] <elukey>	 we could move the journal nodes as last step
[14:49:03] <elukey>	 before the hdfs master
[14:49:12] <elukey>	 we move them to jessie hosts
[14:49:17] <elukey>	 ensure that nothing explodes
[14:49:21] <elukey>	 reimage the old ones
[14:49:56] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team, 13Patch-For-Review: Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#3084283 (10Ottomata) We maintain a Kafka -> ZMQ endpoint just for these processes :)  If we can...
[14:50:16] <elukey>	 mforns: (whenever you have time) what is the use case for hive.mapred.mode=strict ?
[14:52:52] <mforns>	 elukey, strict mode doesn't let you do some things like cartesian joins or in this case... let the table access the respective partition folder in hdfs, second: http://stackoverflow.com/questions/39049620/no-partition-predicate-found-for-alias-even-when-the-partition-predicate-in-pres
[14:53:02] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 06Performance-Team, 13Patch-For-Review: Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#3084290 (10Ottomata) > Although PYTHONPATH must be /srv/deployment/eventlogging  T118772 should...
[14:53:13] <mforns>	 so I set it to unstrict
[14:53:20] <mforns>	 just for this command
[14:55:42] <ottomata>	 elukey:  yeah, +1
[14:55:50] <ottomata>	 elukey:  what's your hacky way to keep partitions?
[14:57:02] <fdans>	 joal: hmm, datasets.xml seems to expect a frequency, but my exported data has no such thing, it's just numbered
[14:57:27] <joal>	 fdans: You probably don't need dataset.xml in your use case
[14:57:56] <fdans>	 riiight
[14:58:26] <joal>	 fdans: you want to batcave for a minute?
[14:58:38] <fdans>	 that would be great joal 
[15:00:07] <elukey>	 ottomata: https://phabricator.wikimedia.org/T159530#3084091
[15:02:54] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 13Patch-For-Review, 10Scap (Scap3-Adoption-Phase1): Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#3084302 (10Ottomata)   First create an eventlogging/scap/webperf repository in gerrit, and fill it with scap config informatio...
[15:04:03] <ottomata>	 elukey:  that sounds annoying but sane
[15:04:04] <ottomata>	 we can do that
[15:04:05] <ottomata>	 :)
[15:37:14] <ottomata>	 joal:  hi	ops sync ( you can skip if you like)
[15:40:00] <joal>	 ottomata: joining !
[15:52:07] <mforns>	 fdans, if you are working on the new endpoint for legacy pageviews, can I pair? :]
[15:52:35] <fdans>	 mforns: of course, but I'm not currently with it
[15:53:30] <mforns>	 fdans, ok! let me know then when you're going to work on it
[15:55:56] <milimetric>	 a-team: exterminator's here, sorry, will miss standup on short notice
[15:56:19] <joal>	 np milimetric -- good luck with the exterminators !
[15:56:21] <mforns>	 ok milimetric, good luck
[15:56:48] <mforns>	 milimetric'll be backh
[15:56:56] <milimetric>	 my update: continuing to work on prototype, learning interesting things about Vue.js (check out their search function on their guide, it's awesome).  Also met with language team for a while this morning, they have tons of data they're collecting to inform their work and I'll be working with them to port it to reportupdater.
[15:57:09] <milimetric>	 lol
[15:57:40] <mforns>	 cool :]
[16:01:40] <nuria>	 a-team: standduppp
[16:02:01] <nuria>	 ottomata, milimetric , fdans: hola!
[16:03:22] <wikibugs_>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream: Create /schema/:schema endpoint in eventbus service to serve schemas by schema_uri - https://phabricator.wikimedia.org/T159179#3084477 (10Ottomata)
[16:24:07] <wikibugs_>	 10Analytics: Document and publicize AQS legacy pageviews - https://phabricator.wikimedia.org/T159959#3084517 (10mforns)
[16:27:02] <wikibugs_>	 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Build and Install librdkafka 0.9.4 on SCB - https://phabricator.wikimedia.org/T159379#3084544 (10elukey) Yes I'd like to have the same version everywhere, we can coordinate with Traffic to roll out 0.9....
[16:31:07] <nuria>	 meeting? cc elukey fdans
[16:31:36] <mforns>	 having problems with hangouts..
[16:31:39] <nuria>	 Sorry, maybe i should not have included link,we are here: https://hangouts.google.com/hangouts/_/wikimedia.org/goals-checkup
[16:31:52] <nuria>	 mforns: batcave?
[17:10:49] <milimetric>	 nuria: I just got back from working with pest control guy
[17:10:57] <milimetric>	 apologies for missing the meetings
[17:11:06] <nuria>	 milimetric: np.
[17:11:07] <milimetric>	 it was last minute so I sent an IRC ping
[17:11:17] <nuria>	 we are here: https://hangouts.google.com/hangouts/_/wikimedia.org/goals-checkup when ever you can
[17:35:42] <milimetric>	 mforns: I'm going to SoS
[17:35:50] <milimetric>	 (just re-confirming)
[17:45:39] <wikibugs_>	 10Analytics: Unlock Spark with Oozie - https://phabricator.wikimedia.org/T159961#3084689 (10JAllemandou)
[17:47:26] <joal>	 nuria: do you have a minute?
[17:47:32] <nuria>	 joal: yesss
[17:47:34] <nuria>	  FINALLY
[17:47:36] <wikibugs_>	 10Analytics: Spike: Spark 2.x as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#3084705 (10Ottomata)
[17:47:45] <wikibugs_>	 10Analytics: Spike: Spark 2.x as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#3084705 (10Ottomata) Can we do it?  How hard is it?
[17:47:54] <nuria>	 joal:baticueva?
[17:48:02] <joal>	 nuria: sure ! La grotte !
[18:01:56] <nuria>	 ottomata: I ma on hangout!!!
[18:02:00] <mforns>	 milimetric, will you be able to attend SoS with all the exterminating? I can do it if necessary (sorry if repeated message)
[18:02:44] <milimetric>	 mforns: no prob, pest control is done for the day
[18:02:53] <mforns>	 k, thanks!
[18:03:13] <milimetric>	 np
[18:07:47] <elukey>	 poor bohrium, I might have found something useful
[18:08:02] <elukey>	 my current theory is that piwik is bombarded by health checks
[18:08:10] <elukey>	  /o\
[18:09:02] <joal>	 a-team, I'm leaving for tonight, see you all tomorrow !
[18:09:09] <mforns>	 bye joal !
[18:09:15] <milimetric>	 nite!
[18:09:38] <elukey>	 going afk too, tomorrow I'll tackle piwik! 
[18:09:39] <elukey>	 byeee
[18:09:40] <elukey>	 o/
[18:17:33] <mforns>	 me too! byeeeeee
[19:29:24] <wikibugs_>	 06Analytics-Kanban: Spike: Split unique devices data for Asiacell and non-Asiacell traffic in Iraq - https://phabricator.wikimedia.org/T158237#3085151 (10Nuria)
[19:32:08] <wikibugs_>	 (03PS1) 10Nuria: Parqued Code - Asiacell modfications on unqiue devices [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341835 (https://phabricator.wikimedia.org/T158237)
[19:32:45] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Spike: Split unique devices data for Asiacell and non-Asiacell traffic in Iraq - https://phabricator.wikimedia.org/T158237#3085167 (10Nuria) {F6341114}  Asiacell unqiue devices since 2017-01-07
[19:33:33] <wikibugs_>	 06Analytics-Kanban, 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog: Android development event logging broken - https://phabricator.wikimedia.org/T159845#3085189 (10Nuria) 05Open>03Resolved
[19:58:15] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 06Operations: Reinstall  Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3085345 (10Nuria)
[19:58:17] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Move cloudera packages to a separate archive section - https://phabricator.wikimedia.org/T155726#3085344 (10Nuria) 05Open>03Resolved
[19:58:55] <wikibugs_>	 06Analytics-Kanban: Clean up datasets.wikimedia.org - https://phabricator.wikimedia.org/T125854#3085357 (10Nuria) 05Open>03Resolved
[19:58:57] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Move datasets.wikimedia.org to analytics.wikimedia.org/datasets - https://phabricator.wikimedia.org/T132594#3085358 (10Nuria)
[19:59:15] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Audit fstabs on Kafka and Hadoop nodes to use UUIDs instead of /dev paths - https://phabricator.wikimedia.org/T147879#3085371 (10Nuria) 05Open>03Resolved
[19:59:28] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Bump default oozie launcher memory usage - https://phabricator.wikimedia.org/T159324#3085372 (10Nuria) 05Open>03Resolved
[19:59:43] <wikibugs_>	 10Analytics-Dashiki, 06Analytics-Kanban, 13Patch-For-Review: Add extension and category  (ala Eventlogging) for DashikiConfigs - https://phabricator.wikimedia.org/T125403#3085373 (10Nuria) 05Open>03Resolved
[20:00:22] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Fix description of webrequest table - https://phabricator.wikimedia.org/T157951#3085374 (10Nuria) 05Open>03Resolved
[20:00:40] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 06Operations: Reinstall  Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3085380 (10Nuria)
[20:00:44] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: CDH 5.10 upgrade - https://phabricator.wikimedia.org/T152714#3085379 (10Nuria) 05Open>03Resolved
[20:00:51] <wikibugs_>	 06Analytics-Kanban: Create AQS endpoint to serve legacy pageviews - https://phabricator.wikimedia.org/T156391#3085389 (10MusikAnimal)
[20:01:41] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 15User-Elukey: Update Zookeeper heap usage configuration and set alarms - https://phabricator.wikimedia.org/T157968#3085393 (10Nuria)
[20:03:23] <wikibugs_>	 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 15User-Elukey: Update Zookeeper heap usage configuration and set alarms - https://phabricator.wikimedia.org/T157968#3021877 (10Nuria) 05Open>03Resolved
[20:03:41] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Run a 1-off sqoop over the new labsdb servers - https://phabricator.wikimedia.org/T155658#3085401 (10Nuria) 05Open>03Resolved
[20:03:54] <wikibugs_>	 06Analytics-Kanban, 13Patch-For-Review: Create EventStreams swagger spec docs endpoint - https://phabricator.wikimedia.org/T158066#3085402 (10Nuria) 05Open>03Resolved
[20:03:57] <wikibugs_>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3085403 (10Nuria)
[20:22:14] <wikibugs_>	 10Analytics, 06Research-and-Data: geowiki data for Global Innovation Index - https://phabricator.wikimedia.org/T131889#3085445 (10Rafaesrey) Dear Leila,  Hope you are doing fine. I write you to follow up on the data collection process. I also want to let you know that we now have an official launch date for th...
[20:48:15] <ottomata>	 milimetric:  how's your regex foo these days? :D
[20:48:51] <milimetric>	 it's ok ottomata, what needs regexing
[20:49:19] <ottomata>	 batcave?
[20:49:33] <milimetric>	 omw
[21:16:16] <nuria>	 ottomata: yt?
[21:16:19] <nuria>	 msg ottomata 
[21:26:43] <ottomata>	 yo sorry was talking with dan
[21:28:22] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 10EventBus, 10MediaWiki-Vagrant, 06Services (done): Kafka logs are not pruned on vagrant - https://phabricator.wikimedia.org/T158451#3085618 (10Pchelolo) 05Open>03Resolved
[22:51:43] <wikibugs_>	 (03CR) 10Krinkle: Service Worker to cache locally AQS data (032 comments) [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/302755 (https://phabricator.wikimedia.org/T138647) (owner: 10Nuria)
[22:52:38] <wikibugs_>	 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3085869 (10Pchelolo)