[00:54:20] nuria: milimetric: madhuvishy: hi again :) [00:56:36] AndyRussG: hi :) [01:04:58] nuria: nice tool, that network throttling on Chrome! So far, banners are appearing as expected with low bandwidth, though on the lowest setting, they do take a while :/ [01:06:03] Could be that users navigate away before the banner is shown [01:06:24] madhuvishy: ^ [01:06:25] AndyRussG: that sounds plausible [01:06:51] i wish we could accurately discount for non main namespace pages [01:07:46] I wonder how accurate the chrome throttling option is. I recall not much enjoying the old command-line network throttling tool [01:08:43] madhuvishy: we could potentially feed it a list of all namespaces. It would make for a long query tho [01:08:58] hmmmm [01:10:28] Do we have regexes? [01:10:39] in hive? [01:10:40] yes [01:10:42] in Hive sql? [01:10:44] ah cool [01:10:57] RLIKE is a regex function too [01:11:09] Ah hmmm [01:11:25] https://en.wikipedia.org/wiki/Wikipedia:Namespace [01:11:39] i guess we'd have to make a list of all of them except the main one [01:12:50] AndyRussG: it might be easy to make a udf temporarily [01:13:02] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF [01:13:10] is you dont want the query to bloat [01:15:12] actually, nah that's more work since this udf doesn't already exist - we can add it if it's useful though [01:17:10] Interesting!! [01:17:11] K [01:17:49] There's some config that says what the namespaces are also, that really determines it (while the wiki page could potentially be out of sync) [01:18:38] ah [01:27:06] I think for this evening I'll focus on the possible network issue [01:28:14] AndyRussG: Okay - i'll be at the holiday party - ping if you need me :) [01:33:04] madhuvishy: have fun! thanks! [02:48:19] AndyRussG: wait how come you're filtering down to project = 'en.wikipedia' in the wmf.pageview_hourly subquery but not in the wmf.webrequest one? Is it at this point impossible to get '/beacon/impression' hits anywhere except english wikipedia? [02:48:33] like, I know you're not running the campaign, but is it impossible to get those, even if there's some bug? [02:48:42] I guess that's easy to test [02:49:14] milimetric: no, it's not impossible, I think. It should be added to the queries in fact [02:50:02] However we do know hat in those countries, CentralNotice is targeting 100% of anonymous pageviews [02:50:57] There might be banners for logged-in users that overlap. So yeah, we should filter, but anyway the rumor was that not more than 1% of pageviews are logged in [02:51:11] I know that's not a great methodology tho 8p [02:51:38] milimetric: btw I think the issue is network latency. Just downloaded the latest chromium, playing with the lowest setting, the banners are indeed taking a long time [02:51:55] Definitely long enough for people to navigate away [02:51:59] AndyRussG: makes sense, they were pretty slow even on 4G for me, on my blackberry [02:52:14] yeah, it would also explain why mobile is so much worse [02:52:19] yep [02:52:32] And also variation by country [02:52:34] but it does leave desktop, english wikipedia as the unexplained one [02:52:42] 79% is still pretty low if you're expecting more like 95% [02:52:44] Well, some people have shitty internet in any case [02:53:05] Mmm add in the other factors, maybe it's less of a difference [02:53:16] Really? they were slow on 4G? [02:54:48] What I have to dig into is why it's that slow. It shouldn't be [03:05:49] hm, AndyRussG I only found 11 thousand hits to /beacon/impression that were *not* to en.wikipedia.org on "desktop" by "user"s in the US. [03:06:04] So that wouldn't be a significant percent [03:06:23] milimetric: cool thanks!! [03:06:47] I wish there were a better way to quantify bots. It sounds like we have a huge swing vote there [03:06:54] true [03:07:19] At least 2%, could really be as high as 10% or 20% even? [03:07:29] AndyRussG: is there a way to know from the raw hit to /beacon/impression which page title it was looking at? [03:07:37] like a parse_url way? [03:08:10] Not in the a URL param [03:08:16] Maybe referer, in the weblog? [03:08:28] ooh! yeah, good call, I'll check that [03:08:54] it might not be set for a lot of them, but where it is we can join webrequest back to itself and try to figure out if there are specific pages not getting the banner [03:09:06] milimetric: yeah referer has it [03:09:10] that could help you figure out if there's a pattern (like the bigger the page the less likely the banner) [03:09:37] milimetric: wow! That'd be pretty amazing :) [03:09:54] argh, I have to go... [03:10:09] milimetric: np!! thanks so much for your help, eh? [03:10:14] AndyRussG: if you aren't able to run that query tonight, lemme know and I'll try it either really late tonight or early tomorrow [03:10:35] just leave me a msg. on here before you leave with any thoughts [03:10:40] sorry, ttyl [03:10:47] milimetric: K cool thx again!! cya! [03:11:14] (I'll also shout out if I find out more w/ low latency testing) [03:12:17] I wonder if the logs themselves would have any info on the quality of the connection they came from [03:12:32] Some obscure http data point? [03:36:19] AndyRussG: you can exclude bot traffic (and some user traffic) if you remove the nocookie as i suggested earlier [03:36:51] nuria: right! but what proportion of each does that remove? [03:37:21] nuria: how many users disable cookies or go incognito? [03:37:56] AndyRussG: It is explained on the page i linked earlier, let me send it again [03:38:42] https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_access_solution/BotResearch ? [03:38:54] AndyRussG: if you remove all nocookie traffic you are removing about 7% of real user traffic. [03:40:40] There are several things that i think you need to do to quantify your estimates, the 1st one figure out how many of the requests you are counting are coming from browsers w/o js support (ie8 and below on desktop for example) [03:42:02] AndyRussG: you can look at general browser stats but i would actually do those queries for the countries you are interested in, they will be more precise. For example: if ie8 and below receive no javascript and we have 2% of pageviews by those browsers your numbers need to differ at least 2%. [03:42:43] nuria: yeah that was one of the first things we did. Concatenated browser family and major version strings, looked at all those that appear in page views but never apper in /beacon/impression [03:42:48] They add up to about 4% of pageviews [03:42:57] At least for the hour sample [03:43:40] nuria: BTW I'm looking at the network latency issue w/ chrome as you suggested, that may indeed be the key [03:44:02] AndyRussG: Ok, let me give you some browser numbers see if they match up [03:44:20] cool thx! [03:44:33] nuria: Do you know of any column in the hive tables that can indicate the quality of the network connection? [03:45:30] AndyRussG: that info doesn't exist [03:45:56] Mmmm yea I imagine [03:46:06] I wonder if it could be added somehow? [03:46:42] I mean, varnish is doing a http connection to each client, there must be enough back-and-forth to get some idea [03:47:03] (not that that would be something to try to get right away...) [03:48:22] AndyRussG: no, it cannot be added easily that information was going to be in a js api now deprecated [03:49:26] hmm [03:51:08] AndyRussG: From: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/BrowserReports [03:51:17] nuria: anything else that might be a proxy for network connectivity? region, I guess... anything else? [03:51:43] AndyRussG: device in mobile, i wouldn't use region, you would be surprised [03:51:47] Maybe type of device? More expensive devices more likely to be used on good mobile connection? [03:51:57] Hmmm K [03:52:00] AndyRussG: [03:52:03] https://www.irccloud.com/pastebin/bd7MF16A/ [03:52:46] These are ie8/ie7 numbers so at least that many overall would not get your banners, let's look at mobile numbers [03:53:33] K cool [03:54:15] AndyRussG: and in mobile only opera mini is this many: [03:54:19] https://www.irccloud.com/pastebin/zN252Jhu/ [03:54:43] Yeah lemme e-mail u the spreadsheet I did... [03:54:48] >2.5% , banners might work in opera mini but if so i bet they do not work super well [03:57:01] Look at the last tab on that spreadsheet. There are flaws in the queries but they're not very big [03:58:02] On mobile total 4% of pageviews are browsers (family + major version string) never seen in that sample of /beacon/impression [03:58:31] What I didn't look at is differences is the distribution for browsers that do show up in both [03:59:51] AndyRussG: That makes sense, please also take a look at your selects with nocookie=0 so you know for sure you only have user traffic and you will get an underestimation of the data you are after. [04:01:33] nuria: right! Ahhh I understand what to do there!!! Heh I was thinking of nocookie=0 only on pageviews, to figure out the % of bots, but the problem was it also includes users... however... [04:01:59] if I put nocookie on _both_ sides, I know that I'm probably only looking at real users... [04:02:02] Hmmm though... [04:03:13] I mean, there may be bots that emulate browsers and call /beacon/impression too, and set cookies for all the calls related to a pageview [04:04:01] But I guess those are just impossible to detect and we can't care about those, and they're probably in the minority [04:07:40] So I'll add that onto all the queries [04:10:38] If I break down both datasets by device family, os and os major version and compare the % of the pie in each dataset (like I did for browser family + major version), and also remove browsers that never run JS [04:11:04] (or that are incompatible w/ our JS) [04:11:31] and if I find that older OSs show a greater discrepancy [04:11:33] AndyRussG: ok, yes you got it. doing nocookie=0 you will be calculating your difference in a dataset that -we know- [04:11:49] that could be a proxy for network quality [04:12:06] K right [04:12:07] is smaller than the real user pageviews dataset but in your case you care abour ratios so it doesn't matter [04:13:05] nuria: right! it's all about removing unknowns [04:13:33] AndyRussG: right, once you do that work (plus assesing whether your code actually works well with poor connectivity) if differences are bigger than expected let us know [04:14:03] nuria: yeah that makes a lot of sense! :D [04:14:53] AndyRussG: Ok, you can let us know how things work tomorrow. [04:15:22] nuria: yeah! hey thanks so much ¡mil gracias! [04:15:59] AndyRussG: np [07:09:18] Analytics-Tech-community-metrics, DevRel-December-2015: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1889828 (Lcanasdiaz) This issue is highly related to #T118169 The name displayed on the person panel is using the profi... [07:48:33] Analytics-Tech-community-metrics, DevRel-December-2015: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1889855 (Lcanasdiaz) As far as I know the "desc" elements are deprecated. The only functionality being used by the dash right now... [09:29:52] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1889965 (Nemo_bis) > Honestly, if you're looking for a "broader audience", the developer summit is exactly the wrong place. Good point. I *believe* that the pr... [09:57:05] Analytics-Tech-community-metrics, DevRel-December-2015: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1889997 (Aklapper) Thanks, great to hear! Just to clarify, will that use the `uuid`'s `profile > name` (... [09:59:53] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015, Patch-For-Review: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1890006 (Aklapper) Upstream patch is https://github.com/Bitergia/grimoire... [10:17:58] Analytics-Tech-community-metrics, DevRel-January-2016: "Unavailable section name" displayed on repository.html - https://phabricator.wikimedia.org/T121102#1890035 (Aklapper) a:Lcanasdiaz [10:20:28] Analytics-Tech-community-metrics, DevRel-January-2016: demographics.html: "Tickets participants" has no "retained" user data at all and looks suspicious lately - https://phabricator.wikimedia.org/T120569#1890038 (Aklapper) a:Dicortazar [10:28:34] Analytics-Tech-community-metrics, DevRel-January-2016: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1890072 (Aklapper) [10:57:56] Analytics-Tech-community-metrics: Data in korma project pages has confusing labels, is difficult to understand - https://phabricator.wikimedia.org/T110524#1890128 (Aklapper) Waiting for Luis' approach to take for {T103469} before trying to tackle this one. [11:20:08] Analytics-Tech-community-metrics, DevRel-January-2016: demographics.html: "Tickets participants" has no "retained" user data at all and looks suspicious lately - https://phabricator.wikimedia.org/T120569#1890167 (Aklapper) * TODO: The "184 attracted" data seems to be wrong as Bugzilla was made read-only i... [11:23:01] Analytics-Tech-community-metrics, DevRel-December-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1890183 (Aklapper) @Lcanasdiaz to take a look if we can have aliases for ITS in the navigation side bar (as "Tickets" might be har... [11:23:53] Analytics-Tech-community-metrics, DevRel-December-2015: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1890185 (Aklapper) [11:23:55] Analytics-Tech-community-metrics, DevRel-December-2015: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1890184 (Aklapper) [11:33:28] Analytics-Tech-community-metrics, Easy, Patch-For-Review: Entered text in Typeahead search field nearly not visible in Firefox 42: Fix the CSS - https://phabricator.wikimedia.org/T121101#1890215 (Aklapper) https://github.com/Bitergia/grimoire-dashboard/pull/4 [11:33:34] Analytics-Tech-community-metrics, Easy, Patch-For-Review: Entered text in Typeahead search field nearly not visible in Firefox 42: Fix the CSS - https://phabricator.wikimedia.org/T121101#1890217 (Aklapper) a:Aklapper [11:47:45] Analytics-Tech-community-metrics, Developer-Relations, DevRel-December-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1890232 (Aklapper) a:Dicortazar>Aklapper [11:50:06] Analytics-Tech-community-metrics, DevRel-January-2016: demographics.html: "Tickets participants" has "184 attracted" data for 1year Hello mforns :) [13:02:50] joal, hi! I'm leaving now for lunch, but this morning I added a section to your anonymization docs regarding quantifying how much data we loose with anonymization. I already did some tests with that. -> The code is in cloud9 [13:03:10] mforns: You have managed to get the thing working ? [13:03:57] joal, yes!#not [13:04:07] Mwarf ? [13:04:19] it works for one level of anonymization, but without writing the results [13:04:28] I could not overcome the permission problem [13:04:39] have you changed the writing folder? [13:04:41] but I got stats for anonymization! and entropy too! [13:04:43] yes [13:04:48] Weeeeird [13:05:01] I wrote to hdfs:///mnt/hdfs/user/mforns/anonymization [13:05:08] hm, not sure what you mean for entropy, let's discuss that when you get bakc [13:05:10] ahahahahahhahhhhhhh!!!! [13:05:15] maaan [13:05:17] so dumb [13:05:26] huhuhu :) [13:05:27] ai ai [13:05:29] Sorry :) [13:05:40] no no, I've earned it [13:06:04] ok, I corrected the code in c9 [13:06:05] I'll test some more code, read the things you have written [13:06:10] ok [13:06:10] Let;s talk after your lunhc :) [13:06:23] sure, see you in 2 hours [13:06:26] :] [13:06:34] Enjoy your lunch :) [13:06:55] bye [13:41:10] Analytics-Tech-community-metrics, DevRel-December-2015: What is contributors.html for, in contrast to who_contributes_code.html and sc[m,r]-contributors.html and top-contributors.html? - https://phabricator.wikimedia.org/T118522#1890313 (Aklapper) [13:41:13] Analytics-Tech-community-metrics, Developer-Relations, DevRel-December-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1890314 (Aklapper) [13:41:14] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1890311 (Aklapper) Open>Resolved If I got it right this was a potential bug created with the first execution of the searching tool for mer... [14:02:04] Analytics-Tech-community-metrics, DevRel-December-2015: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1890338 (Aklapper) >>! In T103469#1889855, @Lcanasdiaz wrote: > One question, from your point of view, what would happen when more... [16:07:35] (PS1) EBernhardson: Update avro schemas to use event-schema repo as submodule [analytics/refinery/source] - https://gerrit.wikimedia.org/r/260030 [17:00:43] joal, mforns , ottomata , madhuvishy : standddupppp [17:00:50] nuria, going! [17:09:11] AHHHH [17:09:12] AHHHHHH [17:20:35] oh ja joal, i invited you to a meeting i'm about to have with discovery about hadoop <-> elasticsearch data [17:20:38] starts in 10 [17:20:45] yup, will be there [17:32:00] joal come on in! [17:32:04] https://plus.google.com/hangouts/_/wikimedia.org/discuss-getting [17:32:42] joining [17:32:46] sorry ottomata [18:13:56] Hey mforns, you are here ? [18:14:05] joal, yep, still in batcave :] [18:14:09] join us [18:14:13] k [18:14:41] ottomata: interesting meeting, thx for having invited [18:17:38] thanks for coming! [18:23:20] i need more sleep. thanks for the explanation mforns! I'll read up more this weekend too and understand this better! sorry i couldn't be very useful [18:23:50] madhuvishy, np! see you next day! [18:40:46] Analytics-Backlog: Add instruction text next to the input fields in the Program Global Metrics Report - https://phabricator.wikimedia.org/T121899#1891169 (Abit) NEW a:Milimetric [18:44:01] milimetric, btw! [18:44:11] I was looking at the analytics list [18:44:31] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1891198 (Milimetric) > Honestly, if you're looking for a "broader audience", the developer summit is exactly the wrong place. I would bet that most of the peopl... [18:44:40] and an email of yours got filtered to be moderated... O.o [18:46:31] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1891211 (Milimetric) >>! In T112956#1889965, @Nemo_bis wrote: >> Honestly, if you're looking for a "broader audience", the developer summit is exactly the wrong... [18:46:32] I mean it was held for approval... [18:46:47] dunno why.. just approved it [18:48:26] Analytics-Backlog: {kudu} Add instruction text next to the input fields in the Program Global Metrics Report - https://phabricator.wikimedia.org/T121899#1891225 (Abit) [18:55:57] Analytics-Backlog: Add instruction text next to the input fields in the Program Global Metrics Report {kudu} - https://phabricator.wikimedia.org/T121899#1891244 (Abit) [19:03:33] Analytics-Backlog, Analytics-Wikimetrics: Display global metrics report results on same page as report inputs {kudu} - https://phabricator.wikimedia.org/T121262#1891252 (Abit) [19:07:58] Analytics-Backlog: Visualization of Browser data to substitute current reports on wikistats - https://phabricator.wikimedia.org/T118329#1891267 (Milimetric) @Krinkle, I'm trying to start work on this now, but there may be too many interruptions until after the dev summit. >>! In T118329#1885357, @Volker_E wr... [19:13:59] Analytics-Tech-community-metrics, DevRel-December-2015: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1891277 (Aklapper) Part of this (affiliation) is covered by {T118169} [19:17:16] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1891279 (Nuria) >Yes, we can choose one of the InnoDB tables mentioned on the description. Can you stop writes to a single... [19:17:47] mforns: are you arround the beginnning of next week? [19:55:29] so weird, irccloud signed me out [19:55:36] thanks mforns, I'm not sure why either... [20:37:49] Analytics-Kanban, EventBus, Patch-For-Review: Refactor kafka puppet roles with hiera for more generic use than analytics [8 pts] - https://phabricator.wikimedia.org/T120957#1891478 (Ottomata) Open>Resolved [20:38:06] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Send HTTP stats about eventlogging-service to statsd [3 pts] - https://phabricator.wikimedia.org/T118869#1891480 (Ottomata) Open>Resolved [20:38:09] Analytics-Kanban, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1891481 (Ottomata) [21:34:25] Analytics: Better redirect handling for pageview API - https://phabricator.wikimedia.org/T121912#1891561 (Dominicbm) NEW [22:26:56] nuria, yes, I'll be there monday to wednesday, I'll take thursday 24th off [22:42:09] Analytics-Tech-community-metrics, DevRel-December-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1891763 (Aklapper) a:Aklapper>Lcanasdiaz [22:45:11] I'm running a series of hive scripts on stat1002 that is pulling data for api.php requests for the month of December into a table in my 'bd808' database. If it causes any problems yell at me or kill the "load-action_ua_hourly.py" script [22:57:09] kill you or yell at the script, got it [22:59:27] ori: yeah that works too. My office sits on high ground and has good vis in most directions so I'm not too worried about a frontal assault [23:32:43] madhuvishy: December 28th [23:32:51] (that's when I'm gonna see Star Wars) [23:40:21] milimetric, ooooh I can't wait [23:41:14] my wife said she will wear Leia buns :] [23:43:49] milimetric: Nicee [23:44:15] I'm seeing it tomorrow. There will be no spoilers :P [23:45:12] hahaha, awesome, btw, don't worry about spoilers, you can talk about it all you like [23:45:36] first, I'll forget everything you say the second the credits roll, I get *really* into movies. Second, I don't mind knowing how things end [23:48:00] Analytics-Backlog: We may be missing some more spiders when tagging pageviews {slug} - https://phabricator.wikimedia.org/T121934#1891933 (Milimetric) NEW [23:53:59] :D