[05:17:01] (CR) KartikMistry: [C: 2] Add a simple script to run all the sorted errors in one go [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/284128 (owner: Amire80) [08:21:31] (Merged) jenkins-bot: Add a simple script to run all the sorted errors in one go [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/284128 (owner: Amire80) [09:17:20] nuria_: Hi, just added a response to the discussion [09:17:25] elukey: hello ! [09:17:46] o/ [09:17:51] What's up ? [09:17:54] joal: hola. [09:18:14] Woooow, not used to get nuria_'s answers at that time of day ;) [09:18:23] joal: I will take a look but I will probably move the discussion to a ticket [09:18:31] nuria_: no problem for me [09:18:33] joal: jaja [09:18:44] joal: as a ticket is easier to find than a talk page [09:18:45] nuria_: I just added a bit of precision [09:18:48] sure [09:20:49] hola nuria_ ! [09:21:01] hola amigo elukey [09:22:28] joal: this morning I am installing new appservers and coding stuff in vk, my soul is already compromised. Would you mind if we discuss rack awareness for our dear friend Cassandra this afternoon? [09:22:44] I don't mind elukey :) [09:22:50] Good luck with vk ! [09:23:14] I have something that compiles now but doesn't work -.- [09:24:23] * joal is starting a voodoo dance for elukey [09:25:58] :D [09:26:04] * elukey stepping afk for a bit! [09:26:17] elukey: what is "vk"? [09:27:14] nuria_: VarnishKafka [09:27:24] joal: ahahahah [09:39:05] Hi leila ! [09:39:15] I didn't manage to get in touch yesterday ;) [09:39:48] Let me know if you want to discuss public metrics [09:42:38] Hi joal. no worries. first question: is everything you have demo for in Druid at the moment public data? [09:42:51] leila: everything is acvtually private data [09:43:02] nuria_: :P [09:43:03] second question: is there an easy way for Erik and I to check the demo, or play with it, joal? [09:43:08] leila: current pageview info is not sanitized yet [09:43:14] joal: got you. [09:43:16] leila: waht demo? [09:43:23] leila: ah druid? [09:43:28] leila: you can access druid using ssh tunnelling, yes [09:43:28] nuria_: yes. [09:43:29] hi nuria_. [09:43:49] leila: you can access no problem but it has teh same restrictions thna hive [09:43:59] *the same restrictions than hive [09:44:10] so both you and erik can access [09:44:10] no problem, nuria_. Erik and I wanted to get a better sense of the UI and the filters available, etc. [09:44:19] got you, nuria_. [09:44:22] sure , super easy [09:44:28] 1 liner: [09:44:32] :D [09:44:35] I'm all ears. [09:44:43] ssh -N stat1002.eqiad.wmnet -L 9090:stat1002.eqiad.wmnet:9090 [09:44:50] * leila checks the command [09:44:53] leila: and after go to http://localhost:9090 [09:45:04] THAT'S IT! [09:45:15] nuria_: do you have the link to the phab task that summarize it? [09:45:25] nuria_: could be useful as well ;) [09:45:36] any feedback like "you guys are awesome" you can send our way [09:45:52] nuria_: :D we will. [09:46:04] leila: we also take complimernts via phab: https://phabricator.wikimedia.org/T136836 [09:46:09] I'm in the airport now, so I guess I'll leave the ssh-ing for later. I'll email Erik now to check it out. [09:46:10] *compliments [09:46:23] ya, it's super easy [09:46:34] perfect. thank you, joal and nuria_. [09:46:46] Ah nuria_, I thought the phab task contained the ssh command [09:47:07] for nothing leila, nuria_ did all the job here ;) [09:47:10] joal: no, it is confusing cause you need prior access [09:47:16] to 1002 [09:47:26] joal: but i can add it [09:47:34] yes nuria_, the 1 liner was provided by Dan in the email, but on phab [09:47:44] NOT on phab sorry [09:48:11] Analytics-Kanban: Gather user feedback from druid prototype for pageview data - https://phabricator.wikimedia.org/T136836#2395952 (Nuria) [09:48:11] joal: fixed [09:48:22] Thanks a lot nuria_ [10:45:48] * elukey lunch! [13:39:18] (Abandoned) Joal: Add pageview aggregation and parquet merger. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212541 (owner: Joal) [14:21:47] joal: sorry still buried in things to do, currently https://github.com/varnishcache/varnish-cache/blob/a50c99f6b3883d1a58cedfe26511bfc0d30d50bb/lib/libvarnishapi/vsl_dispatch.c#L1308 /o\ [14:22:44] it's ok elukey, I do other things, however the later we make decisions, the later reloading with correct data starts [14:23:04] sure [14:23:14] what about in 30 mins? [14:23:30] elukey: Can even be tomorrow, there is not that much rush :) [14:23:42] I also understand context switching issues :) [14:23:53] also brain limit issues [14:24:00] don't forget those ones :P [14:24:00] huhu :) [14:24:18] two questions in the meantime [14:24:19] So, tomorrow morning, before your brain gets messed up with vk :-P [14:24:28] sure [14:24:41] 1) is it acceptable to stop piwik to reboot the host in which it runs? (Kernel upgrade) [14:25:04] 2) I'd need to schedule a reboot of aqs100[123] for kernel upgrades :) [14:25:56] elukey: Arf, I can't really help with piwik, I don't know much about it [14:26:06] elukey: about aqs, no bother [15:04:16] Thanks for heads-up elukey :) [15:14:27] halfak: comments added to https://en.wikipedia.org/wiki/User_talk:EpochFail#Improving_POPULARLOWQUALITY_efficiency [15:14:45] regarding storage of top 100.000 items [15:14:47] nuria_, I saw that. Thanks for quickly responding there all Wikipedian-like :D [15:16:17] I wonder if we can compromise with a less crazy number than 200k [15:16:28] joal, ^ re. top viewed articles [15:16:33] halfak: I am tempering to answer to dicsuss that [15:16:59] halfak: Bottom line is that serving the data would be easy due to caching. Probably computing it too (joal can correct me here) but storage adds two orders of magnitude. not a lot but some significant amount [15:17:42] halfak: another way to put it is to change the hourly pageview dumps to store them in descending order by pageview by project [15:17:58] nuria_: exactly to my point [15:18:54] halfak: from what I understand, the objective is not even the top 200k, but the top 10 of stub articles from anOres point of view [15:18:55] joal, oh! So we'd have N top viewed articles where N is the number of articles! [15:19:05] That's right. [15:19:17] It seems that this importance-focus is a common and high utility use-case. [15:19:49] halfak: currently, ordering in pageview dumps is done by "project and postfix" then "article" [15:20:00] Re. stubs, we're hoping to happen to find many of them in the top N, but eventually, we'll remove stubs from the top N by directing attention to them. [15:20:00] halfak: that does make sense [15:20:06] What is postfix? [15:20:27] halfak: c;ose to mobile/zero/desktop split [15:21:30] Oh no, actually I'm very wrong halfak, excuse me [15:23:07] it is: lang-code[2 chars][.zero/.m for mobile].projectclass[1 to 3 chars, empty for wikipedia] [15:23:10] halfak: --^ [15:23:13] Soooooo easy [15:24:59] elukey: We moved my 1-1 with nuria_ , no anops meeting today - Ok? [15:25:27] halfak: The only problem I view with the approach is to having to parse hourly dumps to make a daily one [15:25:30] ah ok I was about to join! [15:25:32] sure :) [15:32:07] joal, not sure I grok the hourly/daily parsing issue [15:33:04] halfak: Currently we release hourly dumps - If you want to use them for daily daily data, you need to download 24 of them, aggregate [15:33:10] and then recompute top [15:33:46] Gotcha. That is currently happening, right? [15:33:50] For the top 1k? [15:58:04] Analytics: Capacity projections of pageview API document on wikitech - https://phabricator.wikimedia.org/T138318#2396838 (Nuria) [16:00:21] halfak: sorry, was in 1-1 [16:00:50] halfak: The top 1k we currently compute using hadoop over a day (and a month) of pageviews [16:01:16] joal, so generating more would be a similar hadoop job, right? [16:01:23] halfak: correct [16:01:28] halfak: we could even possib [16:02:06] possibly take advantage of a single job (top means ordering, which is long because of single end-reducer [16:05:08] joal, ordering should be merge-sort-ish, right? [16:05:13] That would mean many reducers [16:05:26] many reducers, then single final one [16:13:45] joal, +1 that makes sense. [16:51:43] halfak: Would Ellen be the kind of person to have an NDA? [16:53:15] halfak: Cause I can't think of a solution that would make her happy with the current situation [20:29:30] Analytics, Commons, Multimedia, Tabular-Data, and 3 others: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) - https://phabricator.wikimedia.org/T120452#2397626 (Yurik) [20:31:57] Analytics, Commons, Multimedia, Tabular-Data, and 3 others: Allow structured datasets on a central repository (CSV, TSV, JSON, GeoJSON, XML, ...) - https://phabricator.wikimedia.org/T120452#2397636 (Yurik) [20:32:03] Analytics, Commons, Multimedia, Tabular-Data, and 4 others: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426#2397639 (Yurik) [21:53:20] overheard: [21:53:22] 14:21 https://analytics.wikimedia.org/dashboards/browsers/ is so cool [21:53:22] 14:41 bookmarked! [21:53:23] :)