[09:42:23] <wikibugs>	 Analytics-Wikistats: Determine total number of external links in all Wikipedias - https://phabricator.wikimedia.org/T137984#2393122 (PleaseStand) >>! In T137984#2390918, @PleaseStand wrote: > Here is one possible way to count external links:  Now I ran a modified version of my queries, which break the (de-du...
[10:16:24] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393189 (leila)
[10:20:30] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393196 (leila)
[10:27:21] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393233 (leila)
[10:28:24] <wikibugs>	 Analytics-Kanban, Operations, Traffic, Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2393240 (elukey) We finally tracked down all the sources of null/missing end timestamps coming from Varnish:  1) Varnish Pipe logs,...
[10:35:41] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393275 (leila)
[10:36:36] <elukey>	 joal: o/
[10:36:46] <elukey>	 http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
[10:36:49] <elukey>	 looks nice
[10:37:16] <lzia>	 hi joal. Question. :D
[10:38:02] <lzia>	 joal: Erik Zachte and I are discussing T117221, and we are wondering can we see the druid environment you showed in the meeting with WMDE the other day? (you -> Nuria)
[10:38:16] <lzia>	 it can help us understand how much technology will be available.
[10:38:48] <wikibugs>	 Analytics-Kanban: Test cassandra compactions on new AQS nodes - https://phabricator.wikimedia.org/T135145#2393276 (elukey)
[10:44:43] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393278 (leila)
[11:02:29] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393294 (leila)
[11:04:38] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393297 (leila)
[11:42:57] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393363 (leila) @Heather:  Some background: Erik and I are working on a list of metrics and qualifiers for the Comms team. We need your help to make...
[11:53:14] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393385 (leila)
[13:05:24] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393525 (leila)
[13:14:40] <wikibugs>	 Analytics: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#2393562 (BBlack) I haven't been able to find the ticket ref, but I know in the past there was some longer-term question around the basic utility of the WMF-Last-Access data (as in, whether th...
[13:17:11] <elukey>	 urandom: hi! Whenever you have time I'd like to chat about cassandra-rackdc.properties :)
[14:37:19] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#2393734 (leila)
[14:37:44] <wikibugs>	 Analytics, Research-and-Data, Research-consulting: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#1769033 (leila)
[14:59:37] <urandom>	 elukey: sup?
[15:00:32] <elukey>	 hello!
[15:01:06] <elukey>	 during the offsite we were wondering if it would be good to add rack awareness to aqs100[456] to avoid having two replicas on the same node
[15:01:12] <elukey>	 (for the same data)
[15:01:34] <urandom>	 oh, you definitely do need to, yes
[15:01:49] <elukey>	 now they are all rack1
[15:01:57] <elukey>	 that is the default for the cassandra class afaiu
[15:02:05] <urandom>	 right
[15:02:30] <urandom>	 you need to use the network topology strategy, and make each machine its own 'rack'
[15:02:32] <elukey>	 so I was wondering what is the correct procedure to enable rack awareness.. just add the rack/dc keys to each instance one at the time in Hiera?
[15:02:56] <urandom>	 have you already initialized the cluster?
[15:03:03] <urandom>	 you can't change it after
[15:03:51] <urandom>	 by after i mean, after loading with data
[15:04:27] * elukey starts crying
[15:04:46] <urandom>	 uh oh
[15:04:58] <elukey>	 we have loaded some data but we are still in the load testing phase :)
[15:05:14] <elukey>	 so it is fine to tear everything down
[15:05:19] <elukey>	 even if we'll loose time
[15:05:52] <urandom>	 ok
[15:06:37] <elukey>	 so I'd need to follow https://wikitech.wikimedia.org/wiki/Cassandra#Bootstrap_a_brand_new_cluster to flush everything
[15:07:00] <elukey>	 but before that I'd need to add hiera config for rack awareness
[15:07:59] <urandom>	 yeah
[15:08:14] <elukey>	 joal will be soooo happy
[15:08:18] <elukey>	 :D
[15:09:24] <urandom>	 so, if you're resetting the whole cluster, you'd have the rack/dc info setup, and then you'd bring up the first node with `auto_bootstrap: false`, then add each new one with `auto_bootstrap: true`, and then load your data
[15:10:06] <urandom>	 and you'll have to wipe /srv/cassandra-[a-z/ on each, yeah
[15:10:30] <elukey>	 maybe I'll let joal to finish his tests
[15:10:37] <elukey>	 and then we'll wipe everything
[15:11:37] <urandom>	 or, you can decommission each node in turn, and then rebootstrap it with the right rack info
[15:11:41] <urandom>	 or you can do repair
[15:12:03] <urandom>	 the latter being very undesirable
[15:12:37] <elukey>	 might try the one node at the time re-boostrap
[15:13:18] <urandom>	 yeah, i dunno, i guess it depends on how long it takes you to reload
[15:13:27] <urandom>	 might be faster than a decomm/bootstrap
[15:13:46] <elukey>	 yeah
[15:13:57] <elukey>	 can I send you a code review for the rack awareness?
[15:14:03] <urandom>	 you guys really need to move to bulk loading
[15:14:31] <urandom>	 s/move to/explore/
[15:14:39] <elukey>	 we load in bulk daily from what I know.. what do you mean?
[15:14:45] <elukey>	 (surely missing the point)
[15:15:04] <urandom>	 there is a mechanism for bulk-loading SSTable files
[15:15:15] <urandom>	 that uses streaming
[15:15:28] <urandom>	 and there are tools to create the SSTable files to stream
[15:15:51] <elukey>	 woa okok :)
[15:17:35] <urandom>	 elukey: https://phabricator.wikimedia.org/T126243
[15:18:18] <elukey>	 subscribed :)
[15:21:50] <urandom>	 oh, also, yes (of course) to the code review
[15:23:06] <elukey>	 urandom: something like https://gerrit.wikimedia.org/r/#/c/295233/1 ?
[15:24:38] <urandom>	 yup!
[15:25:01] <elukey>	 good! So this one can be safely merged now since I
[15:25:11] <elukey>	 I'd need to do a decommission first to apply it
[15:25:15] <elukey>	 or should I wait?
[15:26:44] <urandom>	 yeah, you'd want to decomm, apply to an affected node, and rebootstrap
[15:26:51] <urandom>	 also
[15:27:07] <urandom>	 are you using network topology strategy?
[15:27:19] <elukey>	 I'd need to double check but I think so
[15:30:11] <urandom>	 checked: it's good
[15:30:24] <urandom>	 also, your superuser password is still the default
[15:30:27] <urandom>	 :)
[15:30:54] * elukey knows it with a lot of shame
[15:33:06] * urandom shrugs
[15:33:52] <urandom>	 it's one of those depth-of-defense things i guess, it makes it slightly harder to do something nasty if you have local access and are already in a position to do very nasty things. :)
[15:34:01] <urandom>	 but only slightly harder
[15:34:14] <urandom>	 mildly inconvenient :)
[15:34:41] <urandom>	 elukey: are you going to wikimania?
[15:34:52] * urandom probably already asked this
[15:37:37] <elukey>	 urandom: nope I am not! Need to take care of some family doctor appointment (my grandma mainly :) so I decided not to go after the offsite.. Even if it is very close to home!
[15:37:58] <elukey>	 are you going?
[15:38:02] <urandom>	 yup
[15:38:20] <urandom>	 i'm traveling atm, as a matter of fact
[15:38:38] <urandom>	 first leg complete (in newark waiting for the flight to milan)
[15:38:55] <urandom>	 sorry you won't be there, but i understand
[15:39:05] <urandom>	 best of luck with your grandma!
[15:39:26] <urandom>	 are others from analytics going to be there?
[15:41:31] <elukey>	 thanks! Dan and Leila will be there!
[16:00:26] <elukey>	 a-team: going to the ops meeting, will send the E-Scrum in a bit!
[16:00:34] <nuria_>	 hola elukey
[16:00:54] <elukey>	 o/
[16:01:03] <nuria_>	 joal: yt?
[16:01:11] <nuria_>	 joal: my hangouts no work
[16:01:14] <joal>	 I am !
[16:01:21] <joal>	 Just arrived, backfilling irc
[16:01:26] <nuria_>	 joal: does hangout work?
[16:01:32] <joal>	 s/filling/logging
[16:01:35] <joal>	 It seems to
[16:01:44] <joal>	 I am connected
[16:01:49] <joal>	 elukey: you here ?
[16:02:02] <elukey>	 yess
[16:18:35] <joal>	 Hi :)
[16:18:56] <joal>	 Just read the discussinton with urandom
[16:19:16] <joal>	 It's a shame we'll have to reload, but there seems to be no other way ;0
[16:25:35] <elukey>	 :(
[16:25:50] <elukey>	 we could do each node one at the time
[16:25:57] <elukey>	 but I think it might be long and messy
[16:26:06] <elukey>	 (messy because I'll do it :)
[16:26:10] <joal>	 elukey: I agree
[16:26:36] <joal>	 elukey: I might try to work on urandom suggestion about bulk
[16:26:52] <elukey>	 that one seems awesome
[16:27:36] <joal>	 elukey: Will change a bit, but mostly ion cassandra cpu-load at write time
[16:57:50] <elukey>	 joal: I've also discovered some nice things about vk today :(
[16:57:58] <elukey>	 and varnish api
[16:58:13] <joal>	 elukey: Arf, so what's up?
[16:58:54] <wikibugs>	 Analytics: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#2394211 (Nuria) @BBlack: short answer: yes. The Last-access cookie has helped us quantify, for example, the shift to mobile in all our projects. More so for data split by country. Some info o...
[17:00:21] <elukey>	 so the timeout seems to be in the varnish log api.. by default varnish flushes a log (even incomplete) if a Begin tag doesn't see its buddy End after 2 mins,
[17:01:12] <joal>	 elukey: told it was a timeout :-P
[17:01:31] <joal>	 nso it's vetry difficult for us to get to know what is actually happening
[17:02:04] <elukey>	 yeah we thought it was a timeout but the problem was figuring out where :)
[17:02:15] <joal>	 indeed :)
[17:02:24] <elukey>	 I think it is a safe net for Varnish
[17:02:30] <joal>	 And bye the way, thanks for the amount of effort you put into that !
[17:02:51] <joal>	 elukey: It's really great to have one person in the team knowing good on that part of the process !
[17:02:59] <elukey>	 still a bit frustrated by no real progress, but we'll get to the end eventually :)
[17:03:12] <joal>	 elukey: knowledge is the first step to action ;0
[17:03:29] * elukey knows that joal is a wise man
[17:03:52] <joal>	 hmm, not so sure of that depending of the days ;)
[17:04:02] <joal>	 But I'll take the compliement ;)
[17:04:11] <elukey>	 joal: one thing that I could add to vk would be a formatter like %{VSL:timeout}x
[17:04:37] <elukey>	 that would output something if vk sees the api timeout somewhere
[17:05:07] <joal>	 elukey: That could actually help us a lot I think
[17:05:45] <elukey>	 and maybe expose a tunable parameter for the timeout?
[17:05:52] <elukey>	 it shouldn't be super difficult
[17:07:13] <joal>	 elukey: Yeah, would involve a small change in camus, but nothing dramatic, and would help quantify the real errors from the timeout
[17:07:45] <joal>	 Also, if we modify the raw log format (adding timeout timestamp), we could take the opportunity to add an error field?
[17:08:19] <elukey>	 yeah the formatter should be an error message
[17:08:31] <elukey>	 something like "timeout"
[17:08:44] <elukey>	 (that is what Varnish returns)
[17:10:23] <joal>	 elukey: Arf, I was expecting a timestamp
[17:10:33] <joal>	 elukey: both value would be greast
[17:11:08] <elukey>	 the timestamp might be a bit painful to implement, since I'd need to pick one, format it, etc.
[17:11:27] <elukey>	 not really a huge deal but an error message would be much better :P
[17:12:53] <joal>	 elukey: I really think we need both :)
[17:13:17] <joal>	 elukey: The idea is that without timestamp, the rows need to be removed (which is a shame)
[17:13:33] <joal>	 elukey: But if not feasible, we'll do without ;)
[17:16:37] <elukey>	 the problem would be what timestamp to pick :)
[17:24:23] <joal>	 elukey: I don't feel like starting a cassandra talk now, tomorrow morning would be ok for you?
[17:24:37] <elukey>	 sure sure! I have to go in a bit :)
[17:24:46] <joal>	 ok awesome
[17:24:49] <elukey>	 and I need to think about vk
[17:25:31] <joal>	 elukey: Don't bgreak your head over that, we can make decision on best fit
[17:25:39] <elukey>	 joal: just one question - would the start timestamp be ok instead of the end: one in case of a timeout?
[17:26:15] <joal>	 elukey: hmm ... Need to think about that, but seems reasonnable
[17:26:32] <elukey>	 it is the only way to make something that always is there
[17:26:56] <elukey>	 and without adding any more time tags
[17:27:36] <joal>	 elukey: makes sense
[17:27:52] <elukey>	 the only drawback would be getting some misalignment again
[17:30:00] <joal>	 elukey: I'd take the 'error' logs out of the sanity check ;)
[17:30:45] <joal>	 elukey: Like that we have coherent logs from a time point of view, no need to remove lines (and we can help Brandon if data is needed cause we have some)
[17:31:01] <joal>	 And we still have a good enough sanity metric
[17:32:38] <elukey>	 all right so if the VSL marker is "timeout" then it will not be used
[17:33:17] <elukey>	 ok I'll try to work on a solution tomorrow
[17:33:26] <elukey>	 :)
[17:33:29] <elukey>	 o/
[17:33:31] <elukey>	 byeeeee
[17:33:35] <joal>	 Bye elukey !
[17:42:52] <nuria_>	 ciaooo
[17:42:56] <nuria_>	 so lonelyyyyy
[17:43:11] <joal>	 nuria_: hehe
[20:43:31] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2395046 (Nuria) Open>Resolved
[20:44:00] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Announce analytics.wikimedia.org - https://phabricator.wikimedia.org/T136426#2395047 (Nuria) Open>Resolved
[20:44:59] <wikibugs>	 Analytics, Analytics-Kanban, Patch-For-Review: analytics specific icinga alerts should ping in our IRC channel. - https://phabricator.wikimedia.org/T125128#2395060 (Nuria) Open>Resolved
[20:46:18] <wikibugs>	 Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review: Reportupdater does not commit changes after each query - https://phabricator.wikimedia.org/T134950#2395067 (Nuria) Open>Resolved
[20:48:20] <wikibugs>	 Analytics-Kanban: Prototype Data Pipeline on Druid - https://phabricator.wikimedia.org/T130258#2395068 (Nuria) Open>Resolved
[20:49:32] <wikibugs>	 Analytics: Productionize Druid Pageview Pipeline - https://phabricator.wikimedia.org/T138261#2395082 (Nuria)
[20:49:52] <wikibugs>	 Analytics: Puppetize pivot UI - https://phabricator.wikimedia.org/T138262#2395083 (Nuria)
[20:50:37] <wikibugs>	 Analytics: Puppetize Zookeeper - https://phabricator.wikimedia.org/T138263#2395096 (Nuria)
[20:52:04] <wikibugs>	 Analytics: Productionize Druid loader - https://phabricator.wikimedia.org/T138264#2395115 (Nuria)
[20:53:20] <wikibugs>	 Analytics: Puppetize pivot UI - https://phabricator.wikimedia.org/T138262#2395134 (Nuria) Access should be restricted by LDAP
[20:55:03] <halfak>	 o/ Hey folks
[20:55:06] <halfak>	 I'm looking at https://en.wikipedia.org/wiki/User_talk:EpochFail#Improving_POPULARLOWQUALITY_efficiency
[20:55:29] <halfak>	 And trying to figure out why we can't get a larger top N list of articles by pageviews without running into privacy issues?
[20:55:35] <wikibugs>	 Analytics: Upgrade Kafka (non-analytics cluster) - https://phabricator.wikimedia.org/T138265#2395136 (Nuria)
[20:56:42] <wikibugs>	 Analytics: Puppetize MirrorMaker - https://phabricator.wikimedia.org/T138267#2395161 (Nuria)
[21:01:26] <wikibugs>	 Analytics-Kanban: Mediawiki changes to publish data for analyrtics schemas - https://phabricator.wikimedia.org/T138268#2395185 (Nuria)
[21:05:20] <wikibugs>	 Analytics: Host edit data on Druid for all wikis. - https://phabricator.wikimedia.org/T138269#2395206 (Nuria)
[22:40:27] <nuria_>	 joal: can you double check my comment here: https://en.wikipedia.org/wiki/User_talk:EpochFail#Improving_POPULARLOWQUALITY_efficiency