[01:01:05] <wikibugs>	 Analytics-Kanban, Spike: Research spike: load enwiki data into Druid to study lookup table performance - https://phabricator.wikimedia.org/T141472#2504168 (Danny_B)
[01:55:39] <milimetric>	 !log limn1 disk full, no idea how to clean it because /public refuses to list its files or listen to me when I try to delete it
[01:55:41] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[02:52:34] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504266 (Smalyshev) Note that @Yurik's template, while being cool, only addresses a part of it - currency conversion. Finding the rates and calculating the actual number is not part of it.
[04:58:49] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504330 (Yurik) @smalyshev, I just finished per-diem calculation page too - only works for FY2016 and USA only, but can be easily expanded (2nd tab of the shared doc)
[07:22:39] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504443 (Smalyshev) Looks cool!   We may want to add Alaska/Hawaii (provided anybody gets to travel there?) from http://www.defensetravel.dod.mil/site/perdiemCalc.cfm  and overseas countries from  https:...
[07:25:38] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504449 (Smalyshev) This page seems to have the source files for the outside-of-the-US rates: http://www.defensetravel.dod.mil/site/perdiemFiles.cfm
[07:25:55] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504450 (Smalyshev) p:Triage>Normal
[08:00:30] <elukey>	 joal: gooood morning
[08:00:51] <elukey>	 I can see the fourth month finishing nicely, an avg of 400GB per instance seems really good
[08:01:05] <elukey>	 compaction took longer this time though afaics
[08:01:38] <elukey>	 I am really struggling with the raid10  vs raid0 choice
[08:01:54] <elukey>	 but raid10 feels the right on in the longer term
[08:02:24] <elukey>	 even if 10 days of loading would be "Wasted" (not completely because we got good data, but it is frustrating anyway)
[08:03:04] <elukey>	 I could rebuild the whole cluster in a couple of days maximum probably
[08:03:37] <elukey>	 if you have time we could chat about it later on during the day
[08:24:51] <joal>	 Hi elukey :)
[08:25:11] <joal>	 indeed 4th month is done :)
[08:25:35] <joal>	 (actually not completely, but almost)
[08:26:21] <joal>	 About raid 10/0, I think we know both pros and cons of each, we just need to make decision :)
[08:27:12] <wikibugs>	 Analytics, Analytics-EventLogging, Patch-For-Review, WMF-deploy-2016-08-02_(1.28.0-wmf.13): Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2504575 (Legoktm)
[08:34:17] <elukey>	 joal: I believe that the final choice would be if we want to keep "only" two years of data or more, and maybe what the analytics community thinks about this choice (as milimetric was saying during standup)
[08:35:17] <joal>	 elukey: yeah, ultimate decision is the trade between functionality (how much data available), and money (how many servers needed)
[08:36:16] <joal>	 elukey: evalution of those is always difficult, and I think I'm worse than many in those type of areas, so while I try to keep in mind the core problem, I don't think I should the one making decision :)
[08:40:55] <elukey>	 joal: nono I don't want to put you in the position to be the one to choose, I just wanted to discuss with you pros/cons :)
[08:41:30] <elukey>	 I think that we had a very different understanding of cassandra when we first started to work on aqs100[456], that changed and improved a lot during these months
[08:41:37] <joal>	 elukey: yes, let's do that and also discuss them with team :) (I was not willing to choose anyway :-P)
[08:42:18] <joal>	 elukey: I'm just mentioning that I can elaborate thoughts around pros and cons, but evaluation of the value is really not my strength :)
[08:43:59] <elukey>	 sure :)
[09:37:23] <grrrit-wm>	 (CR) Addshore: "This shaved 1 hour of the overall cron scheduler scrip run." [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301587 (owner: Addshore)
[09:47:45] <addshore>	 joal: ! :D
[09:48:45] <joal>	 Hi addshore :)
[09:48:50] <addshore>	 https://gerrit.wikimedia.org/r/#/c/301657/ & https://gerrit.wikimedia.org/r/#/c/301661/ ;)
[09:49:15] <addshore>	 I made another oozie job (not tested yet) but I have a feeling we may have far fewer problems with this one :)
[09:50:54] <joal>	 addshore: I'll have a look later on today :)
[09:56:33] <elukey>	 joal: https://grafana.wikimedia.org/dashboard/db/aqs-elukey?panelId=15&fullscreen
[09:57:00] <joal>	 elukey: this is a SUCCESS !!!!
[09:57:07] <elukey>	 yessssssss \o/
[09:57:15] <elukey>	 I checked timings and they match with my restarts
[09:57:23] <elukey>	 I can probably use something that is not hte rate
[09:57:31] * joal pops up a beer and chers elukey :)
[09:57:32] <elukey>	 buuuut It seems that we found a proof
[09:58:55] <joal>	 elukey: And the rate of 500s is also way better :)
[10:02:22] <elukey>	 joal: I am wondering if we should cache auth for more
[10:02:25] <elukey>	 like 10 mins
[10:04:20] <joal>	 elukey: could be good
[10:04:48] <joal>	 elukey: Since we have strong passwords for cassandra, it wouldn't be of any help if somebody sneaks in anyhow, hey?
[10:06:29] <elukey>	 I don't think that it makes tons of difference if we cache for 1 mins or 10, since an attacker that knows how to exploit these passwords will do the same damage with both time windows
[10:06:46] <joal>	 elukey: mostly agreed
[10:06:58] * elukey raises the limit
[10:07:01] <joal>	 elukey: and in case of issue, cluster needs to be taken down in any case I think
[10:21:36] <elukey>	 joal: all right merged! I am going to restart the cluster
[10:21:43] <joal>	 elukey: ok
[10:22:05] <joal>	 elukey: Also, I wonder if this will really change more than the 2sec -> 60sec
[10:22:54] <elukey>	 probably not much, but worth to test imho..
[10:27:11] <elukey>	 aqs1001 done
[10:27:20] <elukey>	 I'll wait a bit before proceeding with the others
[10:29:01] <elukey>	 joal: also from https://grafana.wikimedia.org/dashboard/db/pageviews it seems that all latencies for per article calls dropped
[10:29:14] <elukey>	 mean went below 1s
[10:30:08] <joal>	 elukey: Very correct !
[10:30:41] <joal>	 p75 is still around 1s, but median is more about 0.5s
[10:32:43] <elukey>	 let's see with 10 mins if we'll get even more good performances
[10:33:00] <elukey>	 even if I don't expect much
[10:33:05] <joal>	 elukey: my bet is that it won't change ;)
[10:33:41] <elukey>	 I am optmist and I bet that it will change but something ridicolous
[10:33:47] <elukey>	 :P
[10:34:05] <joal>	 huhu, not measurable means no change, right?
[10:34:33] <elukey>	 yeah :)
[10:34:57] <elukey>	 I am interested to know what mr urandom thinks about this change..
[10:35:08] <elukey>	 it might be something to test also in restbase
[10:35:33] <joal>	 elukey: indeed !!!
[10:53:25] <elukey>	 I also changed a bit https://grafana-admin.wikimedia.org/dashboard/db/aqs-elukey
[10:53:28] <elukey>	 less chaotic
[10:56:51] <milimetric>	 elukey: when you have a bit, I tried to fix a full disk on limn1 last night and I couldn't
[10:57:13] <milimetric>	 maybe we could troubleshoot together?
[10:57:58] <elukey>	 milimetric: o/ sure! I saw the message and I wanted to ask you access to that machine
[10:58:09] <elukey>	 i am not sure what is the procedure
[10:58:58] <milimetric>	 ok, let's do it in the batcave together, I don't think I can give you access, I'll explain
[11:00:04] <elukey>	 sure.. can we do it in ~30 mins?
[11:00:14] <elukey>	 wow it is 7 am in NYC now :D
[11:00:16] <elukey>	 early morning
[11:00:20] <elukey>	 I didn't realize
[11:00:20] <wikibugs>	 Analytics-Dashiki: Automate or Simplify calculating Per-Diems - https://phabricator.wikimedia.org/T140819#2504807 (Yurik) @smalyshev, gave you edit access. If anyone else wants it, let me know
[11:01:38] <milimetric>	 30 min. is just fine
[11:01:55] <elukey>	 super thanks!
[11:07:37] <elukey>	 all right aqs100[123] restarted with the new auth caching settings (10 mins)
[11:07:53] <wikibugs>	 Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2504815 (Milimetric)
[11:10:33] <wikibugs>	 Analytics-Kanban: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869#2504816 (elukey) So far the results are really good, both latencies (especially per article mean, p75 and p99) and READ timeouts decreased a lot since we applied the patch (~27...
[11:12:00] <elukey>	 urandom: https://phabricator.wikimedia.org/T140869#2504816 - this is after the permissions_validity_in_ms increase (or at least it seems to be related)
[11:16:21] <mforns>	 hi team
[11:16:23] <mforns>	 :]
[11:26:37] <milimetric>	 hi mforns
[11:27:41] <joal>	 Hi mforns :)
[11:27:49] <mforns>	 hello!
[11:29:24] <elukey>	 o/
[11:35:50] <milimetric>	 elukey: ping me when you wanna check out limn1
[11:35:52] <grrrit-wm>	 (CR) Joal: [C: -1] "One issue and some style (you are unlucky, I've been doing scala last week, so I'm kinda into it now :)" (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301657 (https://phabricator.wikimedia.org/T141525) (owner: Addshore)
[11:36:22] <addshore>	 joal: haha, all scala comments welcome, the first scala I did was for the last oozie job!
[11:37:00] <joal>	 addshore: functional / iretative is a non-ending discussion, so I suggest, but you decide :)
[11:37:10] <joal>	 addshore: However exceptioncoding is not allowed ;)
[11:38:07] <addshore>	 hehe yep! Well, previous those 6 lines were just " @$metrics['agent_types.' . $type] += $count " ;)
[11:38:41] <addshore>	 But after all of my googling I still couldnt find how best to translate that into scala!
[11:39:11] <joal>	 addshore: The thing you were after I think is folding
[11:39:23] <addshore>	 I'll have a read up on folding then :D
[11:39:59] <joal>	 addshore: When going over a list and carying a context with you at each step (modifyable at each step, obviously), folding is what you want :)
[11:42:59] <elukey>	 milimetric: ready!
[11:43:22] <milimetric>	 k, I'm in the cave
[11:44:32] <grrrit-wm>	 (CR) Joal: [C: 1] "Needs to be tested, but I couldn't find anything wrong reading it :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/301661 (https://phabricator.wikimedia.org/T141525) (owner: Addshore)
[12:23:19] <elukey>	 oh milimetric I forgot to ask something about AQS
[12:23:24] <elukey>	 do you still have 5 mins?
[12:23:29] <milimetric>	 course, coming back
[12:23:59] <milimetric>	 (in the cave elukey)
[12:25:01] <joal>	 mforns: Hi !
[12:25:09] <mforns>	 joal, hi!
[12:25:22] <joal>	 mforns: Would you mind brainstorming around the problem we found the other day?
[12:25:29] <mforns>	 joal, sure
[12:25:43] <joal>	 mforns: batcave?
[12:25:50] <mforns>	 omw!
[12:25:55] <joal>	 mforns: Arf, busy :)
[12:26:06] <mforns>	 joal, batcave-2?
[12:26:08] <joal>	 mforns: https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2
[13:28:43] <mforns>	 joal, back
[13:29:50] <joal>	 mforns: found some things, but not the one we're after :)
[13:29:56] <joal>	 cave-2?
[13:30:17] <mforns>	 joal, sure
[13:34:13] <wikibugs>	 Analytics, Analytics-Cluster: Improve Hue user management - https://phabricator.wikimedia.org/T127850#2505068 (elukey) Some reference for CDH 5.5 are in http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_hue_ldap_config.html
[13:35:50] <wikibugs>	 Analytics, Analytics-Cluster: Improve Hue user management - https://phabricator.wikimedia.org/T127850#2505072 (Ottomata) Ah, lemme restate my previous comment:  We should still get Hue LDAP group based login and permissions to work.  Right now accounts are synced fro LDAP, but they are done so one by one...
[13:37:55] <elukey>	 ottomata: o/ Is it written somewhere how to sync users from ldap to hue?
[13:38:37] <elukey>	 not sure if you have a script or it is you doing copy/paste username-passzorz in hue
[13:40:35] <wikibugs>	 Analytics, Analytics-Cluster: Improve Hue user management - https://phabricator.wikimedia.org/T127850#2505087 (elukey) Yeah that was the idea! I added some reference to remind me some good links :)
[13:40:36] <ottomata>	 heh, elukey sorry, no, but its in hue, hmmm.
[13:40:42] <ottomata>	 where should I put that doc?
[13:40:52] <ottomata>	 i could put it in Analytics/Cluster/Access real quick somewhere?
[13:41:05] <elukey>	 sure! It was just a curiosity, nothing super urgent!
[13:41:41] <elukey>	 I have always wonderered how to do it, sometimes I found the "ask Andrew" tasks in the wikitech doc and put a note to ask you :)
[13:48:34] <ottomata>	 elukey:  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Access#Admin_Instructions_to_sync_a_Hue_LDAP_account
[13:48:50] <elukey>	 \o/
[13:48:51] <elukey>	 thanks!
[13:49:07] <elukey>	 I was reading http://blog.cloudera.com/blog/2014/02/how-to-make-hadoop-accessible-via-ldap/ and there was an "importing users" section
[13:51:39] <ottomata>	 ja but that is a manual action too
[13:51:51] <ottomata>	 which i guess would be ok
[13:51:53] <ottomata>	 to just sync a group
[13:52:01] <ottomata>	 there was something that wasn't working with it when I set it up initially
[13:52:06] <ottomata>	 but its been so long, and its a new CDH version now
[13:52:07] <ottomata>	 maybe it works
[13:53:27] <ottomata>	 hm elukey maybe there's no way to make ldap do the actual auth
[13:54:44] <elukey>	 from http://blog.cloudera.com/blog/2014/02/how-to-make-hadoop-accessible-via-ldap/ it seems that you can specify the search query no?
[13:54:54] <elukey>	 not sure if it works of course :P
[13:55:03] <elukey>	 Yarn seems a lot sneakier
[13:55:14] <elukey>	 probably apache/nginx in front of it will be easier
[13:55:27] <ottomata>	 ja i'm pretty sure yarn doesn't have ldap built in
[13:55:32] <ottomata>	 so ja would have to be something like that
[13:55:36] <ottomata>	 which, elukey might be nice to have
[13:55:47] <ottomata>	 i run a stupid nginx proxy locally to make navigating those redirects better
[13:56:22] <ottomata>	 you know how it redirects you to analytics1001.eqiad.wmnet:8808/.... all the time?
[13:56:48] <ottomata>	 would be nice if a proxy could keep it from doing that
[13:57:05] <ottomata>	 and redirect back to the incoming url, like yarn.wikimedia.org (or in my case with tunnel, localhost:8088
[13:57:47] <elukey>	 ah yes there is also this issue
[13:59:00] <wikibugs>	 Analytics, Analytics-Cluster: Improve Hue user management - https://phabricator.wikimedia.org/T127850#2505133 (elukey) Other references:   - http://blog.cloudera.com/blog/2014/02/how-to-make-hadoop-accessible-via-ldap/   - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Access#Admin_Instructions_to...
[14:36:17] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1032 disk failure - https://phabricator.wikimedia.org/T141550#2505209 (Ottomata) @Cmjohnson, can we look at this today?
[14:36:26] <elukey>	 going afk for a bit! ttl!
[14:47:49] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Mediawiki support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138007#2505253 (Qgil)
[15:04:12] <joal>	 heya ottomata
[15:04:28] <joal>	 ottomata: D'you have a minute for helping a gitignorant?
[15:10:53] <ottomata>	 joal:  sho
[15:10:55] <ottomata>	 what'sup?
[15:11:05] <joal>	 ottomata: I don't mange to git review :(
[15:11:18] <joal>	 ottomata: Tells me duplicate request
[15:12:14] <ottomata>	 check Change-Id in the commit message
[15:12:22] <ottomata>	 does that already exist in gerrit?
[15:12:29] <joal>	 hm, will triple check
[15:12:49] <joal>	 nope
[15:14:59] <joal>	 ottomata: --^
[15:15:54] <ottomata>	 hm, not sure what duplicate request means then
[15:16:02] <ottomata>	 joal:  maybe just try generating a new change-id anyway?
[15:16:09] <joal>	 I did :(
[15:16:14] <ottomata>	 what's the full error?
[15:17:33] <joal>	 ottomata: https://gist.github.com/jobar/c61e84247c52e3f59641d5ff241b8224
[15:17:57] <mforns>	 joal, google kicked me out of the batcave
[15:18:02] <mforns>	 ???
[15:18:07] <joal>	 I think so
[15:18:12] <joal>	 You dropped
[15:18:34] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1032 disk failure - https://phabricator.wikimedia.org/T141550#2505338 (Cmjohnson) Cleared the foreign config from db1032...new VD needs to be setup
[15:18:56] <ottomata>	 joal:  hm.  are you really trying to submit multiple patches?
[15:18:57] <ottomata>	 oh joal
[15:19:08] <ottomata>	 do all those commits have the same Change-Id in their commit messages?
[15:19:21] <joal>	 hm ottomata
[15:19:28] <joal>	 I'm not sure I understand
[15:19:55] <ottomata>	 joal: , i don't know why, but sometimes git review doesn't understand what patches have already been submitted or maybe merged by gerrit
[15:20:01] <ottomata>	 so i see that confirmation message sometimes too
[15:20:03] <ottomata>	 not sure when it happens
[15:20:13] <ottomata>	 usually, you only submit one change at a time via git review
[15:20:17] <joal>	 ok, I'll squash the things
[15:20:25] <joal>	 thanks ottomata
[15:20:27] <ottomata>	 but, if git review thinks you have patches in your history that have not yet been merged, it will try to submit all of them
[15:20:30] <ottomata>	 which, is fine
[15:20:35] <ottomata>	 if they actually have been merged, it will be a noop
[15:20:50] <joal>	 ottomata: I think they've not been merged
[15:20:51] <ottomata>	 but, every individual gerrit patch needs a unique gerrit change id
[15:21:11] <ottomata>	 if you have commits in your history that have not been merged, AND each of those commits happens to have the same Change-Id in the comit message of that commit
[15:21:16] <ottomata>	 then surely gerrit won't like it
[15:21:29] <ottomata>	 if you look at the commit messages of 44ff947 and 7784d8b
[15:21:34] <ottomata>	 do they both have the same Change-Id?
[15:22:00] <joal>	 ottomata: one of them has yes
[15:22:10] <joal>	 ottomata: I think I have a rebase issue
[15:22:16] <joal>	 I'll triple check
[15:22:37] <ottomata>	 aye k
[15:22:42] <joal>	 Thanks for pointing the changeIds ottomata,
[15:22:52] <ottomata>	 joal:  you can always create new Change-Ids and abandon borked patches in gerrit
[15:23:07] <joal>	 hm
[15:23:08] <ottomata>	 hmm, maybe you can't if they have commits on top of them in history
[15:23:11] <ottomata>	 not sure
[15:23:17] <ottomata>	 hm
[15:23:20] <ottomata>	 yeah you probably can
[15:23:44] <ottomata>	 hmm, you might have to cherry pick on new topic branch though
[15:23:57] <ottomata>	 to make a new git commit out of them
[15:24:08] <ottomata>	 then you can git commit --amend and delete the Change-Id out of the message
[15:24:26] <ottomata>	 and the git review hook will make a new one for you
[15:24:36] <joal>	 ottomata: I understood what the issue was I htink
[15:24:44] <joal>	 ottomata: I have rebased incorrectly
[15:24:51] <ottomata>	 aye cool
[15:26:02] <joal>	 ottomata: All those patches are WIP, I'm gonna squash :)
[15:27:18] <grrrit-wm>	 (PS1) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548)
[15:27:31] <ottomata>	 aye
[15:28:20] <joal>	 ottomata: Quick question: How does gerrit know about "Same Topic" reviews?
[15:28:30] <ottomata>	 its your local branch name
[15:28:38] <ottomata>	 when you do just 'git review'
[15:28:56] <ottomata>	 it pushes the review to gerrit against the default branch (usually master), but with the topic set to your local branch name
[15:29:04] <ottomata>	 if you want to push to a real remote branch
[15:29:06] <ottomata>	 you do
[15:29:11] <ottomata>	 git review <branch_name>
[15:29:21] <ottomata>	 either way though, the local branch name is the gerrit topic name
[15:29:22] <joal>	 hm, was thinking of the UI feature :)
[15:30:22] <joal>	 ottomata: Ok I htink I have it: because in a previous patch I referenced other commits ...
[15:30:25] <joal>	 right
[15:31:19] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1032 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[15:31:47] <elukey>	 hello 1032!
[15:31:47] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1032 is OK: DISK OK
[15:31:50] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING
[15:31:56] <elukey>	 ottomata: disk replaced?
[15:32:29] <ottomata>	 not replaced, i think just rebuilt.  somehow it went foreign (talking with cmjohnson in pm)
[15:32:36] <elukey>	 ahhh okok
[15:32:43] <ottomata>	 he rebuilt the vd
[15:32:47] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1032 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[15:32:52] <elukey>	 I think it happened also for the last time
[15:32:57] <ottomata>	 oh ja?
[15:33:03] <elukey>	 iirc something similar
[15:33:07] <elukey>	 let me check
[15:33:11] <ottomata>	 hm, looks like the disk is back with all the data too?
[15:34:40] <elukey>	 yeah same thing that happened last time
[15:34:45] <elukey>	 can't find the task though
[15:34:47] <elukey>	 grr
[15:34:59] <elukey>	 it was the analytics host with 2 disk problems in a row
[15:37:50] <ottomata>	 elukey:  fyi https://github.com/apache/kafka/pull/1605/files
[15:37:55] <ottomata>	 xfs is all cool now :)
[15:38:38] <ottomata>	 ah yay, and in the new RC, our bug is fixed. https://github.com/apache/kafka/pull/1605/files
[15:39:11] <elukey>	 yeah I saw the last one! Xfs. mmmmm...
[15:40:56] <elukey>	 urandom: thanks for the response in the email thread :)
[15:45:54] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1032 disk failure - https://phabricator.wikimedia.org/T141550#2505429 (Ottomata) Ok, the disk is back with all its data.  We don't know why it decided to go all foreign on us.  Let's keep an eye on it.
[15:52:41] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1032 disk failure - https://phabricator.wikimedia.org/T141550#2505430 (Cmjohnson) The disk has been cleared and is back online  The server booted to Raid Configuration mode because it showed a foreign disk. I performed the follo...
[15:52:47] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1032 disk failure - https://phabricator.wikimedia.org/T141550#2505431 (Cmjohnson) Open>Resolved a:Cmjohnson
[16:01:45] <urandom>	 elukey: :)
[16:02:06] <ottomata>	 a-team stanup!
[16:02:51] <elukey>	 urandom: did you see my note about https://grafana.wikimedia.org/dashboard/db/aqs-elukey ?
[16:02:58] <urandom>	 I did, yeah
[16:03:07] <urandom>	 haven't had a chance yet to look at it seriously
[16:03:15] <urandom>	 (yet)
[16:03:20] <elukey>	 I am not sure what happened in there but the latency dropped in avg/p75/p99
[16:03:20] * urandom has it queued
[16:03:24] <elukey>	 (restbase one)
[16:23:18] <mforns>	 elukey, nice weekend!
[16:23:23] <elukey>	 all right going afk, have a good weekend !!
[16:23:24] <elukey>	 byeee
[16:30:34] <joal>	 elukey: interesting reading - https://lostechies.com/ryansvihla/2014/09/22/cassandra-auth-never-use-the-cassandra-user-in-production/
[16:31:10] <joal>	 elukey: We might want to change our cassandra user (even with the improvement of caching, preventing quorum is really a gain)
[16:31:19] <joal>	 Have a good weekend a-teanm !
[17:32:01] <ottomata>	 milimetric:  if you want to join https://hangouts.google.com/hangouts/_/wikimedia.org/brief
[18:03:12] <milimetric>	 sweet, that was interesting
[18:03:35] <milimetric>	 it'll be funny if they get their use case built first, before ours (wikistats 2)
[18:03:48] <milimetric>	 that's fast adoption!
[18:06:12] <ottomata>	 haha yeah
[18:06:30] <ottomata>	 indeed!  well, I think they want to augment the rcfeed schema
[18:06:34] <ottomata>	 which we don't have in kafka (yet)
[18:06:39] <ottomata>	 i'm not sure if that's the right way to go
[18:06:49] <ottomata>	 but we'll work with them on that
[18:29:53] <wikibugs>	 Analytics-EventLogging, DBA, ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2505951 (Jdforrester-WMF) Thank you!
[19:14:25] <leila>	 hey milimetric. Re your question about pageview api data retention.
[19:14:36] <milimetric>	 yes
[19:15:07] <leila>	 You may want to tell people to tell you how often they need the information in the categories that you have mentioned, as well, along with some examples. Ideally, you want to have a table somewhere on a wiki that people can fill it out.
[19:15:57] <leila>	 milimetric, ^.
[19:17:09] <milimetric>	 right, I wanted to see if the conversation gets crazy and go that way later
[19:17:20] <milimetric>	 so far it seems manageable free-form
[19:17:24] <leila>	 ok, makes sense, milimetric.
[19:18:28] <wikibugs>	 Analytics-Kanban: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869#2479077 (Eevans) @elukey Interesting.  We are seeing a few of these in the RESTBase cluster too, [[ https://logstash.wikimedia.org/goto/caf90b6486fb6b939c608ce9e19edbd8 | 32 ove...
[19:22:34] <ottomata>	 milimetric:  if you have a sec, brain bounce about el with me again?
[19:22:49] <ottomata>	 no worry if not, just thinking really
[19:28:00] <milimetric>	 ottomata: yes, was just answering emails
[19:28:05] <milimetric>	 omw
[20:14:11] <mforns>	 bye team! have a nice weekend!
[20:15:04] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2506628 (ellery) @Nuria, @BBlack  I need to clarify that in the example that I gave above, the experiments were not run concurrently, but in sequence.
[20:16:13] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2506631 (ellery) @Nuria  I'm confused about how your statement "a bucket will have control and treatment for 1 experiment". I though that a bucket represents a group of users...
[20:26:55] <wikibugs>	 Analytics-Kanban, Cassandra: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869#2506731 (Eevans)
[20:39:01] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2506770 (ellery) Another issue that is independent of proper randomization, is that for most use cases, the data produced by the system  cannot be used for statistical testing...
[20:53:17] <wikibugs>	 Analytics, Analytics-EventLogging, Patch-For-Review, WMF-deploy-2016-08-02_(1.28.0-wmf.13): Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2506840 (Florian)
[21:26:43] <grrrit-wm>	 (PS1) Milimetric: Update for August meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/301890
[21:26:56] <grrrit-wm>	 (CR) Milimetric: [C: 2 V: 2] Update for August meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/301890 (owner: Milimetric)
[21:38:33] <wikibugs>	 Analytics-Kanban: Productionize edit history extraction for all wikis using Sqoop - https://phabricator.wikimedia.org/T141476#2499954 (Milimetric) I'll try to do this using new hotness oozie generator: https://github.com/etsy/arbiter
[21:38:37] <wikibugs>	 Analytics-Kanban: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#2499884 (Milimetric) I'll try to do this using new hotness oozie generator: https://github.com/etsy/arbiter