[00:19:50] !log Testing logging to mw.o SAL via stashbot [00:19:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:23:45] (CR) Dzahn: "Elukey: this looks like one https://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerCountryBreakdown.htm" [analytics/wikistats] - https://gerrit.wikimedia.org/r/315417 (owner: Aklapper) [07:43:03] (CR) Elukey: "Erik: Would you mind to give to us some suggestions about how to proceed to re-render the html pages with bugzilla links? I think I'd need" [analytics/wikistats] - https://gerrit.wikimedia.org/r/315417 (owner: Aklapper) [09:49:41] elukey: Pivot does not let us search for a specific article does it ? [09:58:28] hashar: you should be able to see pageviews for an article [09:58:30] let me check [09:58:57] (bare in mind that you see data delayed by a day at least) [09:59:57] no you are probably super right [10:00:02] and I am wrong [10:00:09] was trying to get the pages views for certain pages on the french wikipedia [10:00:28] eventually I went to restbase which has all I need [10:00:33] eg https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/fr.wikipedia.org/all-access/all-agents/Typologie%20des%20routes%20pour%20leur%20conception%20en%20France/daily/20161001/20161027 [10:01:06] I'll ask to my team! [10:01:21] I think that it is already aggregated to reduce the size [10:01:26] i guess [10:01:30] because if you need specific data you have already AQS [10:01:36] like you did [10:01:48] well it is all fine to me. I was trying to figure out why the Wikipedia mobile apps show a certain page has been trendy [10:01:53] when the topic it covers is really a corner case [10:37:22] Hi hashar and elukey [10:37:40] Some info on pivot / aqs content [10:38:20] page_title is the dimension with the highest cardinality in our data-space [10:39:13] And when said highest, it's not even comparable to others: page_title contains millions, biggest other (city or ua_device_family, can't recall) is in tens of thousands [10:39:30] joal: cardinality meaning there are too many different entries ? :) [10:39:37] Correct hashar [10:39:48] my stats 101 is a couple decades old :( [10:40:17] eventually I found out the daily page views via restbase which is all fine [10:40:20] hashar: We are collecting per hour or day or month - I'm not even considering this here, only the number of different page_title we deal with [10:40:49] then I am not sure how the Wikipedia app manage to list the top X viewed pages for a given project [10:41:07] I guess there is some query to get the most viewed pages [10:41:14] hashar: So, for fast computation over dimensions, pivot (druid in fact, the tool behind the pivot UI), works well as long as dimension cardinality is not too big [10:41:55] hashar: but for per-page_title (or per-article, depoending on names), aqs is better (but provides no computation, only data serving) [10:42:08] hashar: You can find top per project in aqs (pre-computed) [10:42:16] is that because if I ask for the top 10 pages visited on fr.wikipedia.org, Druid will have to analyze the millions of pages_title to get their page views then sort? [10:42:31] hashar: sort of [10:43:04] maybe Pivot / Druid could get access to the AQS pre computed datasets. No idea whether it makes sense [10:43:15] hashar: In order to give you the result for aggregated, but also for non aggregated, druid stores at the most possible detailled level - and with page title this is too bog for real-time computation [10:43:18] I guess most people interesting in such visualization would just get them directly from aqs [10:44:03] hashar: For viz over pageviews, do you use this: https://tools.wmflabs.org/pageviews/ ? [10:44:11] It's viz tool over AQs [10:44:28] we have soo many different datasets and end user interfaces :] [10:44:41] hashar: you also have: https://tools.wmflabs.org/topviews [10:44:57] hashar: True, but different use cases [10:45:06] oh man [10:45:08] https://tools.wmflabs.org/pageviews/?project=fr.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=France|Typologie_des_routes_pour_leur_conception_en_France|Accueil [10:45:14] that is exactly what I was looking for! [10:45:21] hashar: Yay :D [10:45:32] * joal is happy when people find their need :) [10:46:15] hashar: And actually, a new tool is advertisedwhich I didn't know about :) [10:48:22] and on that viz for fr.wikipedia.org [10:48:26] Accueil is the main page [10:48:38] "Typologie des routes pour leur conception en France" is some obscure article which some how has wayyy more pages views [10:48:46] and shows up as being trendy in the Wikipedia mobile apps :D [10:48:52] So hashar, simple view of UIs: https://tools.wmflabs.org/pageviews for per-article info (top, detail) -- Pivot for project level data over other dimensions (country, user agent) [10:49:33] all set! [10:49:37] thank you very much :] [10:49:56] hashar: No prob :) [10:50:03] having lunch and will fill a task so one can figure out why that page ""Typologie des routes pour leur conception en France"" has so much traffic [10:53:26] hashar: 9 chances out of 10 that it's a bot artifact [10:53:42] hashar: There are many of those artifacts unfortunately [10:54:17] hashar: more precisions: some bots don't declare themelves as bots, and therefore we don't correctly tag them [10:55:09] hashar: We have a task to improve our pageview bot resiliency -- https://phabricator.wikimedia.org/T138207 [10:55:27] hashar: I hope that makes sense [11:18:14] joal: more or less :] [11:18:26] not sure why a bot would hammer that page so much though [11:18:35] hashar: hopefully a bit more than less ;) [11:18:53] hashar: why are you looking for reasons, WE HAZ DATAZ ;) [11:22:57] I guess I was being curious :D [11:23:17] and I found out https://tools.wmflabs.org/pageviews-test/ as a result which I am quite happy about [11:28:00] Yes indeed hashar , those frontend guys do an awesome(TM) job :D [11:40:51] * elukey afk! [11:40:55] (lunch!) [12:44:10] Analytics-Wikimetrics, Community-Wikimetrics: Inconsistent metrics results for 90 day rolling active editors calculation - https://phabricator.wikimedia.org/T147176#2751723 (Abit) a:Milimetric Dan, any ideas about this one? The pages Gabrielle linked include the cohort and start and end dates for ea... [12:47:47] Analytics-Wikimetrics, Community-Wikimetrics: Inconsistent metrics results for 90 day rolling active editors calculation - https://phabricator.wikimedia.org/T147176#2751727 (Abit) @gabrielle_marie_wmch could you please attach the reports to the phab task as well? [13:04:39] !log oozie firewall rules changed - nowonly the analytics network is allowed [13:04:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:07:26] hello team :] [13:13:36] o/ [13:20:27] o/ [13:49:14] which hosts need to have inbound acccess to the hive metastore port 9083? [13:50:43] I'd say stat100[24] [13:50:51] * elukey is thinking [13:51:08] analytics1027 (running hue) [13:51:23] and possibly also the hadoop worker nodes [13:51:45] mmm but you said the metastore [13:51:54] so it might be only the hive daemon? [13:52:12] (hive-server2) [13:54:09] so hue (running on an1027) seems accessing it to display info [13:55:21] moritzm: need to do a bit of research, but we could enable logging to quickly check [13:56:24] or I could also jump tcpdump for 10 mins of traffic? [13:57:11] similar background to what we merged earlier for hue and oozie, need to get rid of $INTERNAL [14:02:25] sure [14:02:30] I can help if you are busy [14:03:49] (PS24) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [14:05:14] Leaving for the airport a-team, will connect from there with my phoen [14:05:25] ok joal good flight! [14:09:52] elukey: thanks, I'll look into it later on or on Monday [14:24:17] elukey, hi! [14:27:05] mforns: o/ [14:28:31] elukey, do you have 5 minutes to chat on alerts? [14:28:35] oozie alerts? [14:28:59] sure! [14:29:01] batcave? [14:29:58] yes elukey omw [14:59:44] harshar; you also have teh pageview tool for this precise case: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2016-01-16&end=2016-02-14&pages=Zika_virus [15:00:43] sorry cc hashar [15:01:28] nuria: yeah joal pointed me to pageviews which was exactly what I was hoping for :D [15:26:29] elukey: wanna chat on metastore while I wait for my flight? [15:32:21] Analytics-Kanban, EventBus, Operations, Patch-For-Review: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2752246 (RobH) I'd suggest that we shift the kafka hosts to using it, not shift away from using it. Most of the servers with multiple disks tend to use LVM. [15:32:26] joal: don't worry let's do it on Monday :) [15:33:45] ok elukey :) [15:44:22] disconnecting for flight - later a-team [15:44:31] good flight joal ! [15:44:32] laters [16:05:50] mforns: yeah, my connection is poor here. On the flight it's supposed to be better but it's delayed again [16:06:57] milimetric, ok, let me know what you want to do. I'm fine if you start and if there are problems I can take the interview from there onwards [16:06:57] mforns: who should merge the bower -> npm change? Should I do that? [16:07:12] mforns: you can run it, and I'll shadow if I can [16:07:32] milimetric, ok, I'll run it [16:07:34] milimetric, about the patch [16:09:28] yes, patch [16:09:44] milimetric, regarding the semantic npm install issue, 2 things: 1) I think that if you remove the semantic.json file it will repeat the semantic installation and readd it again [16:10:12] and 2) ... nothing :D (forgot it) [16:11:28] mforns: ok, I'll see if I can get it to work with a minimal semantic file or something [16:11:42] milimetric, ok [16:31:16] mforns: I looked into it a bit and this semantic thing is pretty stubborn. I'm thinking of letting it build into src/lib/semantic-ui/ and setting autoInstall: true so that it automatically refreshes on upgrades / new installs without prompting the user [16:31:26] that'll make it work with CI and yarn as well, I think [16:31:58] the downside is I have to change the symlinks and any references to the node_modules sources. Ok with you? [16:33:26] (bonus is we don't have to change everything if they change their file structure again) [16:52:16] mforns: perfectly inappropriately I have to board right now so I'll probably miss the interview. If you could add my phone to the hangout that'd be great, but if I drop don't worry about re-adding me [16:52:30] milimetric, sure! [16:52:31] I'll submit my patch now [16:52:41] (PS6) Milimetric: Migrate from bower to npm instead of yarn [analytics/dashiki] - https://gerrit.wikimedia.org/r/316904 (https://phabricator.wikimedia.org/T147884) [16:52:51] I haven't tested fully but I think it's cool, just did what I said above [16:53:02] milimetric, aha [16:53:28] nuria: hiayaa [16:54:04] Analytics-Kanban, EventBus, Operations, Patch-For-Review: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2752425 (Ottomata) [16:54:42] Analytics-Kanban, EventBus, Operations, Patch-For-Review: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2734746 (Ottomata) I fixed the partman recipe to not use LVM, and I reinstalled the box with the non LVM recipe succesfully. I'll wait til next week to p... [17:34:10] ottomata: holaaa [17:34:17] ottomata: yt? [17:35:16] (eating lunch) had a quick comment about streaming hw [17:35:27] it is possible that we might want to use kubernetes for it [17:35:29] not certain [17:35:30] but maybe [17:35:35] which would make hw discussion more complicated i think [17:39:36] ottomata: aham [17:40:13] ottomata: what about the replacement of stats boxes? [17:45:14] nuria: not excited about it :) [17:45:22] maybe just stat1001 this fy? dunno [17:45:30] but, hm [17:45:37] yeah if we replace we can do a big standardization project [17:45:45] hm [17:45:49] ottomata: ya, seems that we need to do that sooner or later [17:45:57] nuria: batcave real quick? [17:46:32] ottomata: give me 5 mins [17:46:37] k [17:47:05] making a coffee.. [17:50:28] ottomata: batcave? [17:54:12] Analytics: Replace stat1001 - https://phabricator.wikimedia.org/T149438#2752595 (Nuria) [17:56:09] Analytics: Replace stat1001 - https://phabricator.wikimedia.org/T149438#2752618 (Nuria) From our notes: Replaces OOW R510. stat1001 is just a webhost for large datasets, similar to dataset1001, but more dedicated for analytics cluster generated data. We may be able to consolodate data here onto dataset1* b... [19:53:43] Analytics, you are great! I complained casually to Dan (and in private) that I can't log in to pivot, and now I see you've fixed it. Thank you! :) [20:02:11] leila: thanks, should be fixed (by elukey ) for all wmf employees [20:02:33] :) [20:06:45] leila: you were not in the wmf ldap group :/ [20:07:00] not sure why [20:07:06] yeah, elukey, and I wasn't sure how I should become part of it. I sort of gave up. ;) [20:08:07] Pivot will probably fix most of the corner cases like yours :) [20:09:57] * elukey afk again :) [20:32:30] ottomata1: yt? [20:35:25] nuria: hiya ya [20:35:34] ottomata: [20:35:55] so from the description of ticket from the given object (statsd/kafka) [20:36:00] we expect something like: [20:36:04] https://www.irccloud.com/pastebin/LnwFeU3c/ [20:36:17] sorry, ticket: https://phabricator.wikimedia.org/T145099 [20:37:04] nuria: yes [20:37:30] ottomata: ok, no arrays of objects or should i account for that too? [20:37:47] ottomata: I guess i can and add the "index" to prop name [20:38:10] yeah, might as well treat the array as an integer indexed object [20:38:29] but ja afaik that is the object, but it could be added to in the future [20:41:46] ottomata: k, will add that [20:42:04] ottomata: let me know if you were looking for more sophistication than this: https://gist.github.com/nuria/064b8d54589d71cb9eb7c3cdc29352e4 [20:42:22] ottomata: will create module once repo is there add tests and rest of code [20:50:07] ottomata: how do we know if depot has been created, do you get notified at all? [21:03:19] reading sorry [21:04:45] nuria: i am assuming the logster names are just there for development :) [21:05:11] nuria: ya it needs a bit more, it needs to send the metrics to the statsd client [21:05:19] as proper types [21:05:23] counter or gauge [21:06:03] so i guess something like [21:08:54] hmm actually maybe that's not necessary [21:09:18] the JsonLogster thing does have infer_metric_type [21:10:09] nuria: are you creating the repo in diffusion? [22:28:35] Analytics-Kanban, Mobile-Content-Service, Wikipedia-Android-App-Backlog, Patch-For-Review, Spike: [Feed] Establish criteria for blacklisting likely bot-inflated most-read articles - https://phabricator.wikimedia.org/T143990#2753267 (JMinor) Took a quick look at the results and this seems like...