[03:53:20] <wikibugs>	 10Analytics, 06Performance-Team: Eventlogging client needs to support offline events - https://phabricator.wikimedia.org/T162308#3159515 (10Krinkle)
[06:35:04] <wikibugs_>	 (03PS3) 10Fdans: Removes subjective fields from mediawikihistory query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346522 (https://phabricator.wikimedia.org/T157362)
[08:30:57] <joal>	 hi fdans
[08:31:10] <joal>	 Have you seen my comments on your refinery-source CR?
[10:14:51] <fdans>	 joal: hello! I have, on them right now :)
[10:14:57] <fdans>	 thank you so much for the CR
[10:15:40] <joal>	 fdans: no prob :)
[10:15:56] <joal>	 fdans: I was just making sure they make sense and look ok to you :)
[10:39:26] <wikibugs_>	 (03PS1) 10Joal: [WIP] Add wikidata graph analysis code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726
[11:08:19] <wikibugs_>	 (03PS2) 10Joal: [WIP] Add wikidata graph analysis code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726
[11:48:20] <joal>	 taking a break a-team
[12:01:33] * elukey brb!
[12:36:45] <wikibugs_>	 (03PS2) 10Fdans: Remove is_productive and update time to revert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362)
[12:36:55] <wikibugs>	 (03CR) 10Fdans: Remove is_productive and update time to revert (0314 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362) (owner: 10Fdans)
[12:40:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Remove is_productive and update time to revert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362) (owner: 10Fdans)
[12:51:25] <wikibugs_>	 (03PS3) 10Fdans: Remove is_productive and update time to revert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362)
[12:58:01] <wikibugs>	 (03Abandoned) 10Fdans: Removes subjective fields from mediawikihistory query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346522 (https://phabricator.wikimedia.org/T157362) (owner: 10Fdans)
[13:11:41] <wikibugs>	 (03PS1) 10Fdans: Remove is_productive and update time to revert [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346755 (https://phabricator.wikimedia.org/T157362)
[13:13:16] <wikibugs>	 (03PS2) 10Fdans: Update jobs to remove is_productive and update time to revert [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346755 (https://phabricator.wikimedia.org/T157362)
[13:24:44] <ottomata>	 mooorninnnng elukey!
[13:24:49] <ottomata>	 should we do an02?
[13:26:37] <elukey>	 morniiiinnnggggg
[13:26:49] <elukey>	 I just pinged you in phab :)
[13:27:05] <elukey>	 lemme grab a coffee and then I'll be ready
[13:27:05] <ottomata>	 oh! ok will read
[13:35:35] <ottomata>	 oh elukey, thought of something about an03 downtime annoucement
[13:35:42] <ottomata>	 an03 is used for druid metadata storage
[13:35:53] <ottomata>	 i think we will need to do druid downtime too
[13:37:07] <elukey>	 yeah :(
[13:37:27] <ottomata>	 hm, ok so an02.   does not have a partman
[13:37:30] <ottomata>	 i am sorry :/
[13:38:54] <ottomata>	 it is RAID 1 across 4 drives
[13:39:03] <ottomata>	 with lvm on top
[13:40:30] <elukey>	 yep I didn't have time this morning to come up with a recipe, but I can try now if you have patience
[13:40:34] <elukey>	 should be easy
[13:40:37] <ottomata>	 haha, if you WANT to
[13:40:48] <ottomata>	 elukey:  do you think it is possible to reinstall a node wthout wiping partitions at all?
[13:40:53] <ottomata>	 just to use the existing / for reinstall?
[13:41:14] <elukey>	 if we do it via console probably yes
[13:41:16] <ottomata>	 that one coudl be 'wiped', but i wonder if we could do it so md0 and lvm didn't go away
[13:41:17] <ottomata>	 yeah
[13:41:29] <elukey>	 all right let's do it via console :)
[13:41:31] <elukey>	 I got the drill
[13:41:36] <ottomata>	 haha ok
[13:41:43] <ottomata>	 batcave?  lemme get settled into monitor
[13:41:51] <ottomata>	 we should backup the stuff we have there remotely first i guess
[13:42:02] <ottomata>	  /srv/backup is mysql backups, 24G
[13:42:15] <ottomata>	  name dir is 1.7G
[13:42:30] <ottomata>	 hm, i'm going to pause LVM backups of MySQL and rsync /srv/backup to stat1004 or 2 or something
[13:42:31] <ottomata>	 ok?
[13:42:39] <elukey>	 ok!
[13:42:45] <elukey>	 going to the batcave
[13:43:02] <ottomata>	 be there in a min, let me start this rsync
[14:04:51] <elukey>	 People we are reimaging analyics1002 to Debian
[14:05:01] <elukey>	 let us know if you see any issue
[14:19:37] <wikibugs__>	 (03CR) 10Joal: [C: 031] "A lst comment, but really not preventing to merge :)" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362) (owner: 10Fdans)
[14:20:08] <fdans>	 ありがとう joal !
[14:22:54] <wikibugs>	 (03PS4) 10Fdans: Remove is_productive and update time to revert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362)
[14:33:47] <wikibugs>	 (03CR) 10Fdans: [C: 032] Remove is_productive and update time to revert [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346519 (https://phabricator.wikimedia.org/T157362) (owner: 10Fdans)
[14:46:10] <wikibugs__>	 06Analytics-Kanban, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library: Logging instrumentation is completely broken - https://phabricator.wikimedia.org/T162365#3160497 (10Milimetric)
[14:46:25] <wikibugs>	 06Analytics-Kanban, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library: ExternalLinksChange Logging instrumentation is completely broken - https://phabricator.wikimedia.org/T162365#3160512 (10Milimetric)
[14:52:34] <Trizek>	 Question concerning apps: French Wikipedia's home page visits have dropped since the community has applied the new design. https://tools.wmflabs.org/pageviews/?project=fr.wikipedia.org&platform=mobile-app&agent=user&start=2016-03-02&end=2017-04-05&pages=Wikip%C3%A9dia:Accueil_principal Anyone knows why?
[14:55:34] <wikibugs__>	 10Analytics-Cluster, 06Analytics-Kanban: Enable hyperthreading on analytics100[12] - https://phabricator.wikimedia.org/T159742#3160532 (10Ottomata)
[14:55:38] <nuria>	 Trizek: I would not assume that is due to nay changes deployed, requests to the main page in large amounts normally indicate connection problems (browser trying to connect to http://fr.wikipedia.org via https , will get redirected to main page and if it cannot stablish a connection the cycle will repeat)
[14:56:29] <Trizek>	 So the big amount of visits is something that is now fixed?
[14:57:13] <Trizek>	 That explains why the results have skyrocket before christmas! :)
[14:57:51] <nuria>	 Trizek: we do not normally need to "fix" anything, if it is connection issue normally can be traced to  a browser, if it is an undetected bot those requests stop all of a sudden
[14:58:44] <nuria>	 Trizek: w/o looking at requests in detail would be hard to say but main page requests huge numbers have indicated connection problems before, as it is our default page
[14:59:28] <Trizek>	 thanks nuria 
[15:09:12] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 06Operations, 13Patch-For-Review: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3160560 (10Nuria)
[15:09:13] <wikibugs__>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 05WMF-NDA: Drop tables with no events in last 90 days. - https://phabricator.wikimedia.org/T161855#3145199 (10Nuria) 05Open>03Resolved
[15:09:42] <wikibugs__>	 10Analytics-Dashiki, 06Analytics-Kanban, 13Patch-For-Review: Show pageviews prior to 2015 in dashiki - https://phabricator.wikimedia.org/T143906#3160566 (10Nuria) 05Open>03Resolved
[15:09:44] <wikibugs>	 06Analytics-Kanban, 13Patch-For-Review: Move reportcard to dashiki and new datasources - https://phabricator.wikimedia.org/T130117#3160568 (10Nuria)
[15:11:29] <wikibugs__>	 10Analytics: Polish script that checks eventlogging lag to use it for alarming - https://phabricator.wikimedia.org/T124306#3160587 (10Nuria)
[15:11:33] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 06Operations, 13Patch-For-Review: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#1952524 (10Nuria) 05Open>03Resolved
[15:15:04] <wikibugs__>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage the Hadoop Cluster to Debian Jessie - https://phabricator.wikimedia.org/T160333#3160595 (10Ottomata)
[15:24:46] <elukey>	 ottomata: 
[15:24:46] <elukey>	 elukey@analytics1001:~$ sudo -u hdfs hdfs haadmin -getServiceState analytics1002-eqiad-wmnet
[15:24:50] <elukey>	 standby
[15:24:52] <ottomata>	 ya!
[15:24:52] <elukey>	 rmadmin is ok too
[15:24:55] <elukey>	 \o/
[15:45:27] <ottomata>	 joal:  wanna look at this jsonschema scala model thing with me real quick
[15:45:29] <ottomata>	 tell me waht you thikn?
[15:45:48] <joal>	 sure ottomata give me a minute and I'll join batcave
[15:46:03] <ottomata>	 k
[15:46:11] <ottomata>	 elukey:  i just re-enabled the mysql backup task and puppet on an03
[15:46:24] <ottomata>	 i think an02 is good to go
[15:46:33] <ottomata>	 did we want to switch it to master today?
[15:46:37] <ottomata>	 or maybe on monday?
[15:47:34] <nuria>	 Trizek: look at agreggated pageviews , they have nor  chnaged that much: https://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&start=2016-11-01&end=2017-03-31&sites=fr.wikipedia.org
[15:48:24] <Trizek>	 There is a big change when we only consider mobile app nuria https://tools.wmflabs.org/siteviews/?platform=mobile-app&source=pageviews&agent=user&start=2016-11-01&end=2017-03-31&sites=fr.wikipedia.org
[15:48:27] <elukey>	 ottomata: I'd prefer on Monday JUST IN CASE
[15:48:47] <elukey>	 so we'll have time to verify that an1002 is working properly
[15:48:49] <elukey>	 wdyt?
[15:48:53] <elukey>	 (I am removing downtime)
[15:49:24] <nuria>	 Trizek: pageviews for app are affected the most when apps get released
[15:49:59] <nuria>	 Trizek: any bug in the app that -for example- does a silent request will "artifically" incresae pageviews
[15:50:12] <nuria>	 Trizek: you need to plot app releases to gether with those to see it
[15:50:38] <nuria>	 Trizek: and again, for apps is the same thing, major Mian_page requests are likely artificial 
[15:50:49] <nuria>	 *artificially increase
[15:51:42] <nuria>	 Trizek: and looking a bit further to browser data (this data is not public) that spike is neither android nor ios
[15:51:49] <nuria>	 Trizek: for the app
[15:52:12] <ottomata>	 elukey:  sounds good
[15:52:23] <ottomata>	 we'll promote it monday, do an01 reimage tuesday
[15:54:18] <nuria>	 https://usercontent.irccloud-cdn.com/file/rbGSXm7q/Screen%20Shot%202017-04-06%20at%208.52.19%20AM.png
[15:54:23] <nuria>	 cc Trizek 
[15:55:13] <Trizek>	 thanks nuria 
[15:55:46] <nuria>	 Trizek: so, on further inspection, it really looks like "bogus" app pageviews
[15:57:46] <Trizek>	 I'm not sure I'm getting everything nuria, but, at least, I known now it is not related to a design change that may have broken any parser.
[15:57:47] <wikibugs__>	 10Analytics-Cluster, 06Analytics-Kanban: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3160686 (10Nuria)
[15:59:50] <nuria>	 Trizek: that graph is app pageviews by user agent
[16:00:08] <nuria>	 Trizek: the blue line is NOT android nor IOS which points to bot data 
[16:00:24] <nuria>	 Trizek: will need to look at it a bit further
[16:03:49] <Trizek>	 No problem nuria. Thank you for taking care of this. Plus I already have the answer I need.
[16:04:27] <wikibugs__>	 10Analytics: Add unique devices dataset to pivot - https://phabricator.wikimedia.org/T159471#3160706 (10Nuria) We should be able to have data per country since the beginning.  Let's do daily and monthly unique devices.   - create ingestion spec for druid (json) - hql for hive  - create  2 oozie jobs with coordin...
[16:04:42] <wikibugs>	 06Analytics-Kanban: Add unique devices dataset to pivot - https://phabricator.wikimedia.org/T159471#3160707 (10Nuria)
[16:07:40] <wikibugs>	 10Analytics, 07Easy: Pre-generate mysql ORM code for sqoop - https://phabricator.wikimedia.org/T143119#3160710 (10Nuria) - changing scoop job to have a parameter for jars (with schema version changes there will be jar changes , those jars are already build) - passing jar to scoop job that will generate ORM code
[16:10:19] <wikibugs>	 10Analytics, 10EventBus, 05MW-1.29-release (WMF-deploy-2017-04-04_(1.29.0-wmf.19)), 06Services (done): Create mediawiki.page-restrictions-change event - https://phabricator.wikimedia.org/T160942#3160713 (10Pchelolo) 05Open>03Resolved And the event is life! Resolving.
[16:20:40] <urandom>	 elukey: re: moving that meeting time...
[16:21:04] <urandom>	 elukey: when is that standup that you wanted to move it before?
[16:21:26] <wikibugs__>	 06Analytics-Kanban: Pre-generate mysql ORM code for sqoop - https://phabricator.wikimedia.org/T143119#3160760 (10Nuria)
[16:22:01] <elukey>	 urandom: I think that it was only conflicting this week, we can keep going and change if needed 
[16:22:04] <elukey>	 wdyt?
[16:22:34] <urandom>	 elukey: up to you guys
[16:22:35] <wikibugs__>	 06Analytics-Kanban: Measuring non content pageviews - https://phabricator.wikimedia.org/T162310#3160775 (10Nuria)
[16:22:43] <wikibugs>	 10Analytics, 06Performance-Team: Eventlogging client needs to support offline events - https://phabricator.wikimedia.org/T162308#3160776 (10Milimetric) p:05Triage>03Normal
[16:23:02] <urandom>	 elukey: happy to do it earlier if that makes it easier on you guys
[16:23:39] <urandom>	 elukey: everyone on that call but me is europe :)
[16:24:09] <elukey>	 urandom: maybe we can ask to Filippo next week so we'll have a good EU quorum :)
[16:24:26] <urandom>	 elukey: sure
[16:28:12] <wikibugs__>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3160810 (10Nuria) a:03Ottomata
[16:34:09] <wikibugs>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3160832 (10Ottomata) @Smalyshev let's jump in a hangout sometime to discuss this more.  Just a few quick points:  > Does not have data back more than 7...
[16:55:30] <nuria>	 joal: sent thoughs about doc proposal, i think is a bit ambitious
[16:55:39] <nuria>	 joal: to analytics-internal
[16:57:03] <joal>	 k
[17:09:19] <ottomata>	 elukey:  did you remove /etc/cron.daily/blogreport on eventlog1001?
[17:09:22] <ottomata>	 its gone
[17:11:10] <elukey>	 ottomata: yep (do you remember yesterday's standup? :)
[17:11:22] <elukey>	 also noted in the sal, backup in /home/elukey
[17:11:26] <ottomata>	 ah didn't realize you were gonna do it!  thought i was ahhh ok
[17:11:29] <ottomata>	 we need to email to tilman
[17:11:30] <ottomata>	 i'll do that
[17:11:35] <ottomata>	 have you already?
[17:11:39] <elukey>	 nope!
[17:11:42] <ottomata>	 k will do
[17:11:43] <ottomata>	 danke
[17:16:42] <elukey>	 logging off!! byeee
[17:17:19] <joal>	 Bye elukey 
[17:22:48] <ottomata>	 AH joal!  
[17:22:53] <ottomata>	 i am wrong about spark json schema infer
[17:22:58] <ottomata>	 default sampling ratio is 1.0
[17:23:03] <ottomata>	 so, it does actually pass through all records
[17:23:11] <joal>	 hm
[17:23:17] <joal>	 ottomata: WTH then ?
[17:23:30] <ottomata>	 joal:  i think it just happens that i don't have any records for my older schema that have the field
[17:23:33] <ottomata>	 because the field is optional
[17:23:34] <ottomata>	 which, is ok.
[17:23:42] <ottomata>	 as long as we don't drop any fields on conversion
[17:23:54] <ottomata>	 it is ok if the hive schema we make doesn't match the full merged history of jsonschema revisions
[17:23:55] <joal>	 ottomata: nope
[17:24:10] <ottomata>	 as long as the hive schema we make contains fields for all possible seen fields in hdfs
[17:24:32] <joal>	 ottomata: nope, we won't drop fields - And, if we build schemas out of newer than never include a field, well, it's not present :)
[17:24:38] <ottomata>	 right
[17:24:39] <ottomata>	 which is fine
[17:24:43] <joal>	 indeed
[17:24:47] <ottomata>	 because it would always be NULL if we had it
[17:25:00] <joal>	 ottomata: We might not have to go for jsonschema at then end :)
[17:25:02] <ottomata>	 yea
[17:25:05] <joal>	 Cool !
[17:25:06] <ottomata>	 i'm going to keep trying this route then
[17:25:13] <joal>	 nice find !
[17:25:24] <ottomata>	 the .read.json method docs even say
[17:25:24] <ottomata>	    * Unless the schema is specified using [[schema]] function, this function goes through the
[17:25:24] <ottomata>	    * input once to determine the input schema.
[17:25:31] <joal>	 nice learning as well, thanks for the pointer (sampling ratio for json schema inference)
[17:25:40] <ottomata>	 https://github.com/apache/spark/blob/v1.6.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala#L28
[17:25:57] <joal>	 awesome
[17:34:15] <ottomata>	 ya joal, that means, that i don't need to group DFs by revision in order to find the uber schema
[17:34:25] <joal>	 ottomata: Yup !
[17:34:29] <ottomata>	 the schema from the current load shoudl have everything
[17:34:32] <joal>	 ottomata: Spark should do it for ya
[17:34:34] <ottomata>	 and i can pass that + the hive schema to your code
[17:34:39] <ottomata>	 to get any needed alters for past data
[17:36:26] <joal>	 ottomata: sounds correct :)
[17:36:33] <joal>	 ottomata: any issue with the ordering stuff?
[17:38:13] <ottomata>	 joal:  after we fixed ordering, but i'm not totally sure why
[17:38:23] <ottomata>	 will see what happens
[17:40:14] <joal>	 ottomata: I'm having a real lot of fun with graphframes on wikidata :)
[17:41:11] <wikibugs__>	 06Analytics-Kanban: All Dashiki Dashboards down - https://phabricator.wikimedia.org/T162320#3161046 (10Nuria)
[17:41:52] <ottomata>	 graphframes joal?
[17:41:57] <joal>	 yessir
[17:42:02] <joal>	 graph on spark
[17:42:38] <joal>	 I'm going to prepare some things to show off next week ;)
[17:42:48] <ottomata>	 ok awesome!
[17:43:29] <joal>	 Off for tonight - Later a-team
[17:45:48] <ottomata>	 laters!
[17:58:03] <joal>	 ottomata: Just recall I should ask you that
[17:58:19] <joal>	 Since we are jessie everywhere on workers, would you mind installing a few packages for me ?
[17:59:03] <joal>	 ottomata: And lastly, would it be feasible to also make a package out of my python data?
[17:59:39] <joal>	 ottomata: https://gist.github.com/jobar/23226cb8e62d966b8a9f0621c4d86771
[17:59:51] <ottomata>	 joal will puppetize that ya
[17:59:59] <ottomata>	 package out of data?  sounds familiar, what data?
[18:00:17] <joal>	 awesome - missing only one thing, the nltk data :(
[18:05:33] <joal>	 ottomata: the data is nltk stopwords, downloaded through python by users - You said maybe you could package it ?
[18:06:02] * joal waves gently to ottomata, expecting a yes :)
[18:08:51] <wikibugs__>	 06Analytics-Kanban: Pagecounts all sites data issues - https://phabricator.wikimedia.org/T162157#3161226 (10Nuria) At the time of loading:  www.wikidata.org needs to be changed to wikidata  www.mediawiki.org needs to be changed to mediawiki
[18:14:18] <nuria>	 joal: in order to reload data in cassandra should i drop existing data?
[18:14:45] <nuria>	 joal: there is data for ww.wikidata that will not get overwriiten as it should be wikidata, same for 'mediawiki'
[18:14:52] <joal>	 nuria: depends - If there are /wrong/ data in cassandra, as in, incorrect key (www.mediawiki.org for instance), would be better
[18:15:01] <joal>	 nuria: TRUNCATE
[18:15:23] <nuria>	 joal: ok, only for those 2 projects + meta as far as i can tell
[18:15:34] <nuria>	 joal: i was not going to load the meta spike at all
[18:15:42] <nuria>	 joal: if it makes sense
[18:15:42] <joal>	 nuria: TRUNCATE will drop everything, and you need to reload everything
[18:16:06] <nuria>	 joal: that seems fine , data is small
[18:16:21] <joal>	 nuria: good idea - Not loading meta at load actually makes sense (represent a lot of the time anyway)
[18:16:25] <joal>	 coll nuria 
[18:16:54] <nuria>	 joal: will super document and exclude met acompletely
[18:16:56] <nuria>	 *meta
[18:17:00] <nuria>	 joal: thank you
[18:17:05] <joal>	 nuria: np, thank you !
[18:19:37] <wikibugs>	 (03PS1) 10Nuria: Corrected triming of hostname [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346802 (https://phabricator.wikimedia.org/T162157)
[18:25:25] <wikibugs>	 (03PS2) 10Nuria: Correcting loading of pagecounts into cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/346802 (https://phabricator.wikimedia.org/T162157)
[18:49:37] <ottomata>	 joal:  don't see why n ot!
[18:49:43] <ottomata>	 sorry, was afk for min
[19:06:31] <nuria>	 joal: can i truncate table on command line? or do i need to use something else?
[19:08:01] <ottomata>	 anybody left online that can help me figure out how to use intellij with refinery-source?
[19:08:03] <ottomata>	 nuria:  maybe?
[19:12:53] <nuria>	 ottomata: jaja
[19:13:27] <nuria>	 ottomata: sure, i have not used it for scala for while 
[19:13:39] <nuria>	 ottomata: is it scala that is not working?
[19:13:51] <ottomata>	 i can't even make it just compile in the IDE
[19:14:27] <ottomata>	 nuria:  bc real quick?
[19:14:31] <nuria>	 ottomata: ah, ok, 1st thing to check is teh default java version
[19:14:35] <nuria>	 ottomata: k
[19:14:54] <nuria>	 ottomata: let me get headset cause I am at the library
[19:39:09] <nuria>	 urandom: question for you if you are arround
[19:40:07] <nuria>	 ottomata: no hear no see
[19:44:01] <urandom>	 nuria: sure!
[19:44:18] <nuria>	 urandom: to truncate data from a table can i do it from cq command line?
[19:44:26] <nuria>	 urandom: or do i need to use nodetool?
[19:44:48] <urandom>	 nuria: yeah, cqlsh
[19:45:00] <urandom>	 TRUNCATE <tablename>;
[19:45:03] <nuria>	 urandom: and that truncate will propagate to all nodes?
[19:45:07] <urandom>	 yes
[19:45:21] <urandom>	 pretty sure that will create a snapshot, too
[19:45:42] <urandom>	 snapshots are hard links, so over time they'll start accumulating space
[19:45:52] <urandom>	 you can clear them with the nodetool clearsnapshot command
[19:46:11] <urandom>	 (that would need to be done on all instances)
[19:46:13] <nuria>	 urandom: ok sir, than you
[19:46:20] <urandom>	 sure thing
[20:17:00] <ottomata>	 <dependency>
[20:17:00] <ottomata>	     <groupId>org.wikimedia.analytics.refinery.core</groupId>
[20:17:01] <ottomata>	     <artifactId>refinery-core</artifactId>
[20:17:01] <ottomata>	     <scope>system</scope>
[20:17:01] <ottomata>	     <systemPath>/Users/otto/Projects/wm/analytics/refinery-source/refinery-core/target/refinery-core-0.0.45-SNAPSHOT.jar</systemPath>
[20:17:01] <ottomata>	 </dependency>
[20:17:03] <ottomata>	 nuria: ^
[20:17:19] <ottomata>	 mvn test -Dmaven.surefire.debug.test -Dsuites=org.wikimedia.analytics.refinery.job.HiveFromJsonSuite -pl refinery-job
[20:36:43] <elukey>	 hello people
[20:36:44] <elukey>	 22:28  <icinga-wm> PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /legacy/pagecounts/aggregate/{project}/{access-site}/{granularity}/{start}/{end} (Get pagecounts) is CRITICAL: Test Get pagecounts returned the unexpected status 404 (expecting: 200)
[20:36:48] <elukey>	 etc..
[20:37:10] <elukey>	 ottomata: --^
[20:37:38] <elukey>	 nuria: --^
[20:37:49] <ottomata>	 oo
[20:37:51] <elukey>	 I saw your messages about truncate/reload
[20:38:19] <nuria>	 elukey: oh man
[20:38:24] <nuria>	 elukey: i just did it
[20:38:36] <nuria>	 elukey: through commandline
[20:38:39] <ottomata>	 don't know context here, what's up?
[20:39:17] <nuria>	 elukey, ottomata : i just dropped data (after talking to urandom )
[20:39:35] <elukey>	 nuria: catching up with the backlog, if you are reloading it is fine :)
[20:39:48] <nuria>	 elukey: so 404 here sounds right: /legacy/pagecounts/aggregate/{project}/{access-site}/{granularity}/{start}/{end} 
[20:39:54] <nuria>	 elukey: i am reloading now
[20:40:01] <nuria>	 elukey: all but meta
[20:40:26] <nuria>	 elukey: there were 3 wikis with bad data and joseph though it would be best to truncate and reload
[20:40:38] <nuria>	 elukey: so we can ignore alarm? 
[20:40:56] <elukey>	 sure sure, I was checking IRC and saw the alerts so thought to ask :)
[20:41:01] <ottomata>	 ok you are loading now?  ok
[20:41:32] <nuria>	 elukey: so SORRY!
[20:41:46] <nuria>	 elukey: i did not though of alerts on this new data
[20:41:56] <nuria>	 ottomata: right, reloading now 
[20:42:31] <elukey>	 nuria: it is super fine, I just wanted to make sure that it wasn't cassandra exploding :)
[20:42:41] * elukey blames urandom 
[20:42:43] <elukey>	 :P
[20:42:56] <nuria>	 elukey: ya, so SORRY again cause i should have though about data alarms
[20:47:16] <elukey>	 AQS alarms are not logged in here, I need to fix it
[20:47:23] <elukey>	 so there will be more visibility
[20:47:39] <elukey>	 AQS alarms should be logged in here though
[20:47:40] <elukey>	 grrr
[20:48:00] <nuria>	 elukey: ok, will write ticket, no worries
[20:48:34] <wikibugs>	 06Analytics-Kanban: AQS alarms need to log to analytics channel - https://phabricator.wikimedia.org/T162407#3161778 (10Nuria)
[20:48:52] <elukey>	 :)
[20:48:56] <elukey>	 thanks!
[20:49:28] <elukey>	 nuria: let me know when data is reloaded so I'll ack the alarms etc..
[20:50:07] <nuria>	 elukey: it will be couple hrs i think you might have gone to bed, but no worries, i will ping someone in ops to acknowledge if i cannot see them
[20:50:42] <elukey>	 it should recover by itself, I'll alert the other ops people
[20:50:49] <nuria>	 elukey: k
[20:52:33] <elukey>	 acked the alarms
[21:23:02] <wikibugs>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3161867 (10Smalyshev) > let's jump in a hangout sometime to discuss this more.  Would be glad to. I'll try to set up something next week.  >  If there...
[21:27:02] <wikibugs>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3161868 (10Ottomata) >>If so, you may want to consider consuming from Kafka rather than EventStreams. >I am considering this too, but I assume it's mor...
[21:40:36] <wikibugs>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3161909 (10Smalyshev) > It will be more, a lot more. What language are you working in?  The end consumer will be Java, but I don't want to consume the...
[21:44:38] <wikibugs>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3161911 (10Ottomata) > I'd rather have some intermediary that cleans up, deduplicates, etc. the changes.  FYI, neither base Kafka Consumer clients nor...
[21:47:29] <wikibugs__>	 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3161917 (10Smalyshev) > FYI, neither base Kafka Consumer clients nor EventStreams does this.  Yes, I know :) It's one of the decisions I still haven't...
[21:55:49] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban: Research Spike: Better support for Eventlogging data  on hive - https://phabricator.wikimedia.org/T153328#3161925 (10Ottomata) Ok, after much spiking, here's the current route we'd like to go down.  We are pretty sure we can infer a good enough schema for each impo...
[22:30:48] <wikibugs>	 06Analytics-Kanban, 13Patch-For-Review: Pagecounts all sites data issues - https://phabricator.wikimedia.org/T162157#3162003 (10Nuria) Realoaded data (took about 2 hours) and now wikidata project column looks correct on db:  cassandra@cqlsh>  select * from "local_group_default_T_lgc_pagecounts_per_project".dat...
[23:02:50] <ebernhardson>	 anyone previously setup pyspark locally for doing development? Running into an issue where i can essentially do collect_list(named_struct(...)) in our prod pyspark running 1.6.0, but i downloaded pyspark 1.6.0 from http://spark.apache.org/downloads.html with 'pre-built for hadoop 2.6' and it doesn't seem to include that functionality yet. 
[23:03:22] <ebernhardson>	 comparing prod, we seem to run hadoop 2.6.0-cdh5.10.0, so would seem it should work (unless perhaps cdh backported some functionality or something?)
[23:17:23] <ebernhardson>	 ahh, it turns out cdh backports things, so cdh hive is not the same as the hive spark is linked to. will figure out how to get a local cdh instead of just pyspark