[00:21:33] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1886507 (Aklapper) I'm surprised that https://github.com/Bitergia/grimoire-dashboard/blob/master... [00:24:51] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1886519 (Aklapper) a:anmol.wassan [01:33:32] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, Zero, hardware-requests, and 3 others: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1886690 (Dzahn) Yes, it would be in wikimedia.org in DNS. @ori but how will that system know anything about traffic... [05:23:15] Analytics-Wikistats: Page view stats: monthly most popular articles not updated - https://phabricator.wikimedia.org/T48204#1886850 (ezachte) Revisiting that page I think my comment about not being terribly essential was mostly for the 2nd, 3rd and 4th tables on that page which focus on most requested non exi... [05:45:16] Analytics-Kanban, CirrusSearch, Discovery, Discovery-Cirrus-Sprint, Patch-For-Review: Setup oozie task for adding and removing CirrusSearchRequestSet partitions in hive - https://phabricator.wikimedia.org/T117575#1886875 (Deskana) [09:10:21] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015, Patch-For-Review: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1887056 (Aklapper) https://github.com/Bitergia/mediawiki-dashboard/pull/76 [09:15:13] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1887063 (Lcanasdiaz) We're having issues getting the metrics with the sortinghat database. Something is broken in the load process, I'm debuggin... [11:07:56] (CR) Joal: "Change has been made so that the search engine classification upgrades initial referrer classification. Sounds good to me except maybe the" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [11:16:02] (PS1) Joal: Update oozie default error email address [analytics/refinery] - https://gerrit.wikimedia.org/r/259654 [12:11:11] (PS1) Joal: Correct pageview oozie job [analytics/refinery] - https://gerrit.wikimedia.org/r/259661 [12:11:40] (PS1) Joal: Add oozie job extracting aqs usage statistics [analytics/refinery] - https://gerrit.wikimedia.org/r/259662 (https://phabricator.wikimedia.org/T118938) [12:21:37] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1887336 (Lcanasdiaz) It's updated (at last!) Thanks for the support @aklapper :) [12:31:46] hey joal [12:31:54] you're not on vacation today I see [12:32:01] huhuhu :) [12:32:18] No. t'was yesterday :) [12:32:42] cool. in that case, I need a little help deploying the geo-breakdown job [12:32:57] Erik and I agreed on formats and everything else, I tested one more time and I self-merged it [12:33:06] (I thought you were out for the rest of the week for some reason) [12:33:09] milimetric: I have seen that yes [12:33:30] milimetric: That's great: ) [12:33:34] So deploy time it is ? [12:33:48] yes, but is it possible to teach me instead of you doing it? [12:34:04] I want to save you future work if possible [12:34:17] :) [12:34:23] Completely possible [12:34:33] sweet [12:34:54] In any case, deploys will probably fall in elukey's box sonner or later ;) [12:35:14] are they documented? I looked but not too hard [12:35:28] (this one needs a refinery-source deploy too) [12:36:36] milimetric: I have no idea about doc ... I learnt from ottomata the same way you are gonna learn from me :) [12:37:12] ok, in that case I shall make a doc at Analytics/Cluster/Deploying [12:37:46] Also milimetric since you are up, before I start gathering data from beginning of november, would you mind doulble checking with me that I have not missed too big of an interesting dimension for aqs stats usage ? [12:38:09] milimetric: if you do that, I how you one more beer [12:38:09] ofc [12:38:36] The beer thing has diminishing returns, let's make it milk [12:39:02] The task for gathering pv api stats was not really priority, but I still didn't want to have it not done with data beign lost [12:39:08] :D [12:39:34] milimetric: https://gerrit.wikimedia.org/r/#/c/259662/1/hive/aqs/create_aqs_hourly_table.hql [12:40:24] * milimetric looks [12:41:22] actually milimetric, looking for docs I found: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery [12:41:42] aha! [12:42:10] joal: the stats table looks ok to me [12:42:30] side note: we are hitting each record *way* too many times for me to be comfortable :) [12:42:53] Thanks for reviewing [12:43:04] (each record in wmf.webrequest) [12:43:17] milimetric: agreed there would be really better way to do what we currently do [12:43:28] Particularly using streaming [12:43:50] aye, sorry, that was off topic [12:43:59] i'm reading the deploy steps [12:44:06] (shall I try them? or are there caveats) [12:44:19] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1887354 (Lcanasdiaz) Ok, data is updated finally. The identities were correctly splittted, have a look at the identity 028b6b8dce6241c60c313f9... [12:44:41] milimetric: let's do it together, and review :) [12:45:50] milimetric: give a minute, I'm still off topic: http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at [12:45:53] :D [12:46:15] ooh! /me reads carefully [12:46:16] milimetric: I'd actually love to test flink :) [12:47:15] 2300 nodes!? [12:47:16] hi a-team [12:47:21] mornin' mforns [12:47:25] :] [12:48:34] hey mforns :) [12:48:39] hello! [12:48:50] mforns: I'd like to spend some time with you on spark today :) [12:49:04] joal, yes please :] [12:49:06] I need to deploy with milimetric first, then I'll have time :) [12:49:13] ok, ping me please [12:49:23] sure mforns, thx :) [12:49:41] yeah you read write milimetric : 2300 node :D [12:50:07] joal, to give you a teaser, anonymizations seems to be working, but there are problems with memory [12:50:38] mforns: backfilled yesterday's IRCs talks, so I kinda followed a bit ;) [12:50:51] ok [12:57:40] the cat just jumped on a knife and sent it flying towards me. I caught it in mid air. I both feel like a ninja and like the cat is trying to kill me [12:58:03] O.o' [12:58:22] It is no news to me you're a ninja milimetric, but I din't know about your killer-cat :) [12:58:33] hehehe [12:58:37] lol [13:03:14] milimetric: couldn't find doc about5 deploying the java code [13:03:17] joal: very interesting read [13:03:19] milimetric: We should make one [13:03:27] I'd say Flink would be my choice too [13:03:39] because it doesn't lag too far behind storm (which is more mature) [13:03:43] milimetric: the ediff between storm/flink and spark is well expained [13:03:44] but it has operator combining [13:03:51] yes [13:04:19] plus milimetric, storm is on the downside, while flink is trending [13:04:39] that shouldn't matter too much, we use Camus after all :) [13:04:57] but what should matter is how easy and fun either of these are to set up [13:05:09] because our concerns aren't as much massive performance but efficiency and time [13:05:24] true milimetric [13:05:33] One reason for flink is also scala :) [13:05:42] :) I like scala [13:05:43] a lot [13:05:45] * joal likes scala [13:05:46] so, deploy? [13:05:50] sure :) [13:05:54] k, i'm on tin [13:06:00] about to do the git deploy [13:06:02] arf, first, deploy java ! [13:06:06] W [13:06:12] oh right [13:06:26] well, doesn't matter really, not like deploying this would start anything right? [13:06:30] Which means: update changelog.md in refinery-source with deployed changes [13:06:43] ok, I'll submit a patch for that [13:06:48] great :) [13:07:33] joal: so how do you guys update this? there's some automatic way? [13:07:44] or manually look through the git log [13:07:52] manualllllll [13:08:14] milimetric: While you're doing this, I'll put up a wiki page for this process [13:08:57] joal: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Refinery-source&action=edit&redlink=1 [13:09:31] k milimetric, I was going to update refiniery, but I think you're right, anither page is probably better [13:12:26] joal: do we update something to say this is now v0.0.24? I couldn't find 0.0.23 other than in some weird xml files [13:12:41] milimetric: maven does that for us :) [13:18:55] joal: oh! I see, you're waiting for me to push this commit. Maven will do that in a *separate* commit [13:18:56] :) [13:19:08] * milimetric detected deadlock [13:19:11] Yessir, sorry it was not clear :) [13:19:13] * milimetric is smarter than a Turing machine [13:19:17] huhuhu [13:19:21] * joal is not [13:19:49] * milimetric thinks Turing machines just need the concept of "awkward pause" and they'll be smarter than him [13:20:23] (PS1) Milimetric: Update changelog.md for v0.0.24 deployment [analytics/refinery/source] - https://gerrit.wikimedia.org/r/259675 [13:21:33] milimetric: I usually ask andrew to review those changes, for this one, I'll take resposibility :) [13:22:06] (CR) Joal: [C: 2] "Looks good to me, let's deploy" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/259675 (owner: Milimetric) [13:25:04] (CR) Joal: [V: 2] "Looks good to me, let's deploy" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/259675 (owner: Milimetric) [13:25:25] I thought jenkins would have done that on its own ... -^ [13:26:21] awkward pause ftw again! [13:26:41] joal: jenkins only does that if it's configured. And we don't have that set up on many repos [13:27:27] Analytics-Tech-community-metrics, DevRel-January-2016: Key performance indicator: Top contributors: Should have sane Ranking algorithm which takes (un)reliability of user data into account - https://phabricator.wikimedia.org/T64221#1887419 (Aklapper) [13:29:22] milimetric: I thought it was done for refinery-source [13:34:39] joal: so now we do magic with maven? [13:38:54] milimetric: just updated the page: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery-source [13:39:15] milimetric: plus, got a phone call from my brother, sorry for high latency :S [13:39:56] np at all [13:40:02] * milimetric reads [13:40:24] joal, is 8 executors 4GB each too much? [13:40:50] mforns: not really :) [13:40:56] ok [13:41:39] mforns: our cluster has 1.6TB ram, so 32G is not that much :) [13:41:56] joal, mmmm ok [13:42:02] now I get the dimension of that :] [13:45:30] woah, weird, this pushes those commits directly to gerrit [13:47:36] joal: it says I'm unauthorized, I'm assuming 'cause I don't have passwords configured for archiva? [13:49:43] correct milimetric [13:49:51] Let me gives that to you [13:55:03] milimetric: so, how's it going ? [13:55:29] good! how are you?! [13:55:34] oh, you mean with the deployment [13:55:46] right now it's stuck on [INFO] 9753/9753 KB [13:56:00] huhu :) [13:56:07] My brother is fine, thank you :) [13:56:15] op, just went past [13:56:20] great [13:57:15] seems to be doing a bunch of file copying / building / random stuff [13:57:36] actually it builds the jars and upload them to archiva [13:57:56] milimetric: I updated https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery with link to the new page [14:09:10] this thing is still building... it's crazy! [14:09:21] Still building, or uploading ? [14:09:25] it looks like it's downloaded the internet, deleted it, then downloaded it again [14:09:34] arf :( [14:09:39] both uploading and building I think [14:10:04] k [14:18:00] ok, just finished [14:18:08] how do I check archiva for the jars? [14:18:15] it looked like it uploaded a lot of different things [14:18:50] it should have uploaded 1 jar per module, which means 6 jars for us [14:19:01] milimetric: go to https://archiva.wikimedia.org/#welcome [14:19:23] I usually browse to org.wikimedia.analytics.refinery [14:20:05] Yay, 0.0.24 is here for hive :) [14:20:32] haven't found it yet :) [14:20:50] I see this... https://archiva.wikimedia.org/#artifact/org.wikimedia.analyics.refinery/refinery-hive/0.0.4 [14:20:57] which says 0.0.4...? [14:20:58] Same for core --> I don't check each of them just 2 or three [14:21:22] try 0.0.24 [14:21:38] I went here: https://archiva.wikimedia.org/#artifact/org.wikimedia.analytics.refinery.hive/refinery-hive [14:22:01] It shows you the various versions it has [14:22:21] I don't understand how you got there... when I browse I end up here: https://archiva.wikimedia.org/#browse/org.wikimedia.analyics.refinery [14:22:56] Analytics-Kanban, Patch-For-Review: Create a dedicated hive table with pageview API only requests for reporting [5 pts] {melc} - https://phabricator.wikimedia.org/T118938#1887513 (JAllemandou) [14:23:28] Ah, may that: there are 2 links when you are here: https://archiva.wikimedia.org/#browse/org.wikimedia [14:23:34] analytics, analytics.refinery [14:23:40] go to the first one :) [14:23:43] Then refinery [14:23:48] * elukey is trying to follow milimetric and joal but for the moment he didn't really understand much [14:23:57] Hi elukey :) [14:24:00] o/ [14:24:11] We are double checking that jars have been correctly uploaded to archiva [14:24:28] joal: there's a misspelled one: https://archiva.wikimedia.org/#browse/org.wikimedia.analyics.refinery [14:24:35] analyics [14:24:41] True ! [14:24:48] that bad one is typo-squatting the good one :) [14:24:52] oh yes I was joking, probably I'll need to bang my head against the whole process multiple times before getting a glimpse [14:24:56] and my mild dyslexia doesn't help :) [14:25:03] :D [14:25:07] but +1 for milimetric's documentation skills [14:25:16] Yessssss ! [14:25:54] elukey: hey, factually, you're totally wrong, I'm the worst at docs, but I'll take it! [14:26:01] :P [14:26:29] ok, joal, so now I can git deploy refinery? [14:26:33] and do the sync from stat1002? [14:26:47] not yet: need to add the jars to the refinery repo :) [14:27:07] i see [14:27:13] Analytics-Kanban, Patch-For-Review: Create a dedicated hive table with pageview API only requests for reporting [5 pts] {melc} - https://phabricator.wikimedia.org/T118938#1887531 (JAllemandou) a:JAllemandou [14:27:49] milimetric: jars to be added in the artifacts folder of the refinery repo, plus links updated [14:28:04] milimetric: Don't change the one that have not been changed (we update lazyly) [14:28:30] Like in your case, I think core, hive and camus is all that changes [14:28:33] ok, but there were no details on how to do this so I'm not sure [14:28:40] camus? [14:29:00] milimetric: you said "let's document it in a wiki page", so if your last sentence is right it'll be basically my first task to fill it right? :) [14:29:07] There is a change in camus, but it's not functional, you can forget about it :) [14:30:06] oh, joal, no that's totally ok, let's update everything that was updated otherwise we'll forget [14:30:18] ok :) [14:30:40] elukey: i think we're all trying to just make this better now: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery-source [14:31:08] so, joal, you gotta walk me through this, I'm not sure where the jars are supposed to be or anything, complete noob [14:31:28] ah! artifacts? [14:31:38] milimetric: so that I replace the XXX placeholder in the doc, let's find where the jars are on your local machine [14:31:54] There should be a folder under refinery-source [14:32:07] in artifacts I see: refinery-hive.jar -> org/wikimedia/analytics/refinery/refinery-hive-0.0.23.jar [14:33:05] yes milimetric [14:33:43] You shouold copy the jars from the special folder in refinery-source into refinery/artifacts/org/wikimedia/analytics/refinery/ [14:33:48] so the process is - download the jars and put them in the correct org/blah/blah path, then update the symlinks? [14:33:52] Then change the links [14:33:55] k [14:34:00] Then commit [14:34:03] k [14:34:08] milimetric: Let's find them locally [14:34:25] In the folder where you perfomred release [14:34:38] joal: I'm seeing ../refinery-source/target/checkout/refinery-hive/target/refinery-hive-0.0.24.jar [14:34:39] There should be a folder with a special name (can't recall it) [14:34:52] It's not target [14:34:59] It's another one I think [14:35:05] target is the one the doesn't work [14:35:12] guard? [14:35:15] nope [14:35:18] let's batcave :) [14:35:23] k [14:35:36] wait joal [14:35:43] but then elukey can't follow as easily [14:35:48] True [14:35:55] Let's just find that folder [14:36:02] sure [14:36:02] The rest is easy enough on IRc [14:36:06] here's what's in my refinery-source: [14:36:14] https://www.irccloud.com/pastebin/x9DhAnHA/ [14:36:53] looks like whatever special folder there was got deleted, so I can just download them from archiva [14:37:07] yessitr [14:37:08] that's better anyway, because it includes the double checking step [14:37:15] Cool [14:37:42] (I'll update the docs for this, I have an idea what to write to make it easy for newbs like me) [14:37:58] Cool [14:37:59] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1887552 (Aklapper) @Lcanasdiaz: I see the "[[ https://github.com/Bitergia/mediawiki-identities/commit/21ca22c956aa20ba8ca2b8add99de4d72f6c4748 |... [14:38:01] Thanks :) [14:39:53] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1887553 (Aklapper) Plus looking at http://korma.wmflabs.org/browser/scr-contributors.html it seems that several staffers lost their affiliation... [14:40:07] joal, I'll leave for 20 mins to have lunch, and come back :] [14:40:25] ok mforns, sorry, took long to deploy :) [14:40:33] np, joal :] [14:43:41] joal: it's really weird but the download links have 127.0.0.1 in them?! http://127.0.0.1:8080/repository/releases/org/wikimedia/analytics/refinery/camus/refinery-camus/0.0.24/refinery-camus-0.0.24.jar [14:44:07] Yes, milimetric, I know that --> Change that host with archiva.wikimedia.org and it works [14:44:15] I don't know why :( [14:44:24] right, I figured, just making sure you knew [14:44:39] Yeah, I know (knowing is not enough) [14:44:53] heh, well this can't possibly be an issue in all archiva versions [14:45:00] that's like ... the entire purpose of this platform [14:45:16] maybe it's just a configuration problem [14:49:23] joal: and git fat will replace the large files with a placeholder when I commit? [14:49:30] or do I have to do fancy git fat stuff [14:49:48] If you have git fat setup correctly, it should do its magic :) [14:50:11] joal: that's a big if... I remember helping andrew with it a long time ago but I'm totally unsure that it's still set up [14:50:16] how do I check? [14:50:33] hm [14:51:25] look into .git/config for refinery folder [14:52:31] joal: my .git/config looks spectacularly skinny [14:52:35] (nothing fat about it) [14:52:41] :) [14:52:57] Talking about the thing, not always being :) [14:53:04] filter "fat" ? [14:53:20] filter? [14:53:29] like cat .git/config | grep fat [14:53:29] ? [14:53:33] yes [14:53:41] yeah, no fat [14:53:46] 100% lean [14:53:51] fat free [14:53:52] Arf :( [14:54:08] :) it's ok! there must be docs for this too [14:54:09] so this is good [14:54:17] The thing is, it'll get fat (actually git-fat) should be named git-unfat) [14:54:43] milimetric: there is doc about that in the readme of refienry I think [14:54:44] (I'm actively editing the Refinery-source page as we do this) [14:54:51] ok, I'll check it out [14:54:52] Awesome milimetric :) [15:01:14] joal: I'm getting: [15:01:16] ERROR:git-fat: Error reading or parsing configfile: /home/dan/projects/refinery/.gitfat [15:01:23] because it looks like I have a .gitfat file from Jly [15:01:26] *July [15:01:28] which says: [15:01:39] [rsync] [15:01:39] remote = archiva.wikimedia.org::archiva/git-fat [15:01:39] options = --copy-links --verbose [15:01:57] Ahhhhh ! My mistake milimetric, this was the one I was looking for :( [15:02:04] Then you're good to go [15:02:10] it says error :) [15:02:12] doesn't sound good [15:02:24] (it does that when I do git fat pull) [15:02:41] (but I also didn't have git fat installed, so I pip install git-fat) [15:02:47] k [15:02:58] but the error's still bad, right? [15:03:21] Right [15:03:25] I never git fat pull [15:03:35] So I can't really say :( [15:04:07] oh ok, hm... I'm very skeptical of magic [15:04:11] but I'll commit and see what happens [15:04:29] k [15:05:10] (PS1) Milimetric: Update refinery-source jars to 0.0.24 [analytics/refinery] - https://gerrit.wikimedia.org/r/259707 [15:05:30] joal: hm... after the commit those jars are still full sized [15:05:44] milimetric: https://gerrit.wikimedia.org/r/#/c/259707/1/artifacts/org/wikimedia/analytics/refinery/refinery-camus-0.0.24.jar [15:05:47] :D [15:06:10] MAGIIIIIIIIIC (http://images.google.fr/imgres?imgurl=http://i.imgur.com/YsbKHg1.gif%3Fnoredirect&imgrefurl=http://imgur.com/gallery/YsbKHg1&h=252&w=275&tbnid=HryI-43v3FHG-M:&tbnh=147&tbnw=160&docid=oTJwQHdeAQymIM&usg=__ZpNAT6srAH8u5dIUWG0QI8o_lIs=&sa=X&ved=0ahUKEwif-svilePJAhWHnRoKHcIoCzwQ9QEIITAA [15:06:39] I agree, that should definitely pop up whenever git fat does something [15:06:50] but... the file in the repo is not changed, that's what really freaks me out [15:06:57] milimetric: I double check the jar id [15:07:11] jar doesn't change in the repo [15:07:18] it's not supposed to [15:07:32] It's just not uploaded in the git repo [15:07:38] instead, you have a git-gat link [15:09:08] k, that's fine [15:09:12] I updated https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery-source [15:09:27] (CR) Ottomata: [C: 2 V: 2] Update oozie default error email address [analytics/refinery] - https://gerrit.wikimedia.org/r/259654 (owner: Joal) [15:09:27] * joal reads [15:09:28] so now when you merge that in gerrit, I can git deploy [15:09:41] yessir [15:10:38] ottomata: Hey, do you know I can check for jar ID in archiva? [15:11:25] to see if the git-fat sha you have matches the one on archiva? [15:11:31] yessir [15:11:35] hmmm [15:12:05] no, not without login to the archiva server [15:12:09] i can see it for you though :/ [15:12:18] hmmm [15:12:18] oh [15:12:18] i mean [15:12:22] np ottomata, we are gonna trust git-fat :) [15:12:23] you can calc the sha [15:12:31] if you dl the jar [15:12:32] milimetric: --^ [15:12:35] yes [15:12:47] shasum whatever.jar [15:13:00] k [15:13:27] milimetric: forgot to double check a thing before merging :( [15:14:18] joal: you haven't merged it yet... [15:14:31] milimetric, right :) [15:14:43] Was wondering about the jar version in your code [15:15:07] I'll merge the new jars in artifacts repo now [15:15:24] (CR) Joal: [C: 2 V: 2] "Moving toward deploy" [analytics/refinery] - https://gerrit.wikimedia.org/r/259707 (owner: Milimetric) [15:16:44] back joal [15:16:50] joal: but you said you forgot to double check something? [15:17:02] Doing it now :) [15:17:44] what's that? I can add it to the docs [15:18:48] milimetric: it's in the code you merged: one property missing in your new job [15:19:00] milimetric: Or at least I think [15:19:34] oh I see [15:19:59] milimetric: parameterization of jar version ;) [15:21:32] oh, joal I just copied that, you mean this: refinery-hive-${refinery_jar_version}.jar; [15:21:56] Yes milimetric q [15:22:15] It's not set in coordinator.properties IIRc [15:22:25] oh! I thought it was magically provided [15:22:28] :D [15:22:42] it's not set, yes, that needs to be hard-coded then? [15:24:01] Yeah, look at refinery/oozie/webrequest/refine/bundle.properties [15:25:00] k, patching [15:28:43] ottomata: I suggest not restarting the oozie jobs to change the email thing. I think it's better to have them restart for a meaninful change, changing that thing meanwhile [15:28:50] yeah [15:28:52] agree [15:28:54] we'll just wait [15:28:56] cool [15:29:06] thx [15:34:05] milimetric: I go try to help mforns if ok for you [15:34:13] mforns: batcave ? [15:34:31] joal, yes! [15:35:00] ok with me, I'm still working on the patch [15:36:20] (PS1) Milimetric: Fix problem referencing jar from geo job [analytics/refinery] - https://gerrit.wikimedia.org/r/259723 [16:09:06] * elukey needs to study the Refinery [16:09:46] milimetric, joal: documenting how to deploy to cluster would be awesome, please do. I would like to be able to do it myself too [16:10:02] nuria: we did with milimetric [16:10:33] joal: thank you [16:10:38] np :) [16:19:01] Analytics-Tech-community-metrics, DevRel-December-2015: OwlBot seems to merge random user accounts in korma user data - https://phabricator.wikimedia.org/T119755#1887697 (Lcanasdiaz) >>! In T119755#1887553, @Aklapper wrote: > Plus looking at http://korma.wmflabs.org/browser/scr-contributors.html it seems... [16:21:43] Analytics-Wikistats: Page view stats: monthly most popular articles not updated - https://phabricator.wikimedia.org/T48204#1887699 (Nuria) >I don't think the new page view api can zoom in on those missing pages/files in particular. True, it cannot. > But again, there seems to be little demand for it. Ok, le... [16:32:44] (CR) Nuria: "Please see @joal comment i also just have one minor comment too." (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [16:35:56] (CR) Nuria: [C: 2] Correct pageview oozie job (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/259661 (owner: Joal) [16:39:26] (CR) Nuria: Fix problem referencing jar from geo job (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/259723 (owner: Milimetric) [16:41:19] (CR) Milimetric: Fix problem referencing jar from geo job (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/259723 (owner: Milimetric) [16:42:47] milimetric: let's talk about jar version of oozie job on standup [16:43:29] sure [16:44:20] elukey: I have invited you to our staff meeting, if you can come gret. If not, no big deal. [16:48:12] nuria: sure! I have some free time, I'll be glad to join :) [16:48:18] excellent [17:00:16] ottomata: standuppp [17:00:43] OO [17:00:45] finding headphones [17:14:56] joal: can you post the flink link here.... I'm not in the hangout [17:15:01] hmsure [17:15:07] http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-atb [17:24:43] nuria: when you +2, code is not merged - Did you meant to merge or not ? [17:25:04] joal: i was waiting for ottomata to look at it but it can be merged yes [17:25:18] k nuria [17:25:23] which? [17:26:09] (CR) Nuria: [C: 2 V: 2] Fix problem referencing jar from geo job [analytics/refinery] - https://gerrit.wikimedia.org/r/259723 (owner: Milimetric) [17:26:39] ottomata: https://gerrit.wikimedia.org/r/#/c/259661/ [17:32:35] nuria: not sure if you guys are already in the batcave but I think that I need permissions to join [17:32:36] elukey: can you join: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave [17:33:10] elukey: i have sent you a new invite [17:33:44] ottomata: staff meeting [17:34:49] bah! [18:02:58] Analytics-Kanban, Analytics-Wikimetrics, Puppet: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} [? pts] - https://phabricator.wikimedia.org/T101763#1888068 (Milimetric) [18:07:16] Analytics-Kanban: Change e-mail list on oozie communications to e-mail alias {hawk} [1 pts] - https://phabricator.wikimedia.org/T121241#1888084 (Milimetric) a:JAllemandou [18:08:32] Analytics-Kanban: Gather preliminary metrics of Pageview API usage for quaterly review {slug} [8 pts] - https://phabricator.wikimedia.org/T120845#1888092 (Milimetric) a:JAllemandou [18:08:57] Analytics-Kanban: Gather preliminary metrics of Pageview API usage for quaterly review {slug} [5 pts] - https://phabricator.wikimedia.org/T120845#1862643 (Milimetric) [18:11:32] Analytics-Backlog: Gather metrics about cluster usage - https://phabricator.wikimedia.org/T121783#1888127 (Nuria) NEW [18:12:18] Analytics-Backlog: Gather preliminary metrics of Pageview API usage for quaterly review {slug} [5 pts] - https://phabricator.wikimedia.org/T120845#1888137 (Milimetric) [18:12:45] Analytics-Backlog: Gather preliminary metrics of Pageview API usage for quaterly review {slug} - https://phabricator.wikimedia.org/T120845#1888141 (Milimetric) [18:19:20] Analytics-Backlog: Implement purging of EL data in Hadoop {oryx} - https://phabricator.wikimedia.org/T121657#1888181 (Milimetric) Open>declined a:Milimetric We don't want to do partial purging in Hadoop. [18:20:02] Analytics-Backlog: 'is_spider' column in eventlogging user agent data - https://phabricator.wikimedia.org/T121550#1888188 (Milimetric) p:Triage>Normal [18:20:43] Analytics-Backlog: 'is_spider' column in eventlogging user agent data {flea} - https://phabricator.wikimedia.org/T121550#1888192 (Milimetric) [18:26:15] Analytics-EventLogging, Analytics-Kanban: Update Eventlogging jrm tests so they include userAgent into capsule {oryx} [3 pts] - https://phabricator.wikimedia.org/T118770#1888226 (Milimetric) a:Ottomata [18:26:59] Analytics-Backlog, Fundraising-Analysis: FR tech hadoop onboarding {flea} - https://phabricator.wikimedia.org/T118613#1888233 (Milimetric) [18:29:13] joal: if you need me for meeting, just ping me, am going to eat FOOOD [18:29:21] enjoy ottomata :) [18:36:59] madhuvishy: Do we wait for you for the sanitization meeting ? [18:38:04] joal, the job with only the first step finished without exceptions, but I get permission issues when writing, should I execute as hdfs? [19:25:38] ottomata: [19:25:41] hi!! [19:25:58] milimetric: I can spend some time wuith you if you want :) [19:25:59] I've deployed everything up to sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run [19:26:06] joal: not allowed, bon apetit [19:26:08] milimetric: deploying refinery stuff? I can do too (never done it before but i can) [19:26:13] :) [19:26:16] huhuhu [19:26:25] madhuvishy: sure, if you have sudo, yea [19:26:27] hii [19:26:34] ok, I'll come back after diner to see if you need me ;) [19:26:40] but i need to go to office, get lunch - so can do in an hour or so only [19:26:55] madhuvishy: ok, I'll ping you if andrew's too busy [19:26:59] let me know if you need me [19:27:00] ya sure [19:27:03] ehhh? [19:27:06] ottomata: you got time to lend me your sudo? [19:27:12] ah sure [19:27:13] I deployed refinery-source [19:27:14] you can sudo -u hdfs [19:27:15] ? [19:27:16] can't* [19:27:17] ? [19:27:20] and I git-deployed refinery [19:27:22] no [19:27:28] (probably for the best) [19:27:31] ah naw, you need analytics-admins [19:27:33] ok [19:27:43] only joal, otto and me are in that list i think [19:28:08] ottomata: you have ops duty just cheat and add everyone :D [19:28:10] doing... [19:28:15] haha [19:28:44] ok, so after that, I guess I gotta submit the coordinator, right? [19:29:04] and that probably requires sudo to run it as hdfs as well? [19:33:52] milimetric yes [19:34:21] I'm guessing basically sub projectviews/geo/coordinator.properties for the legacy_tsv stuff here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie#How_to_deploy_Oozie_production_jobs [19:34:46] Analytics-Tech-community-metrics, DevRel-December-2015: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1888349 (Lcanasdiaz) == Username == The error related to a mixed way of getting the name of a unique id... [19:35:11] hmmm milimetric [19:35:18] this is what i use on joal's advice [19:35:30] https://etherpad.wikimedia.org/p/refinery_deploy [19:35:57] the queue_name essential in the wiki seems wrong [19:36:53] hdfs deploy done [19:37:05] madhuvishy: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie#Alternate_That_Maybe_Should_replace_this_section [19:37:20] etherpads == good for temporary scratch space, bad for long term docs [19:37:36] thx ottomata so then the launching of the coordinator [19:37:49] do you need any info from me on that? Or just the coordinator properties? [19:38:08] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/projectview/geo/coordinator.properties [19:38:08] oh, me? [19:38:11] what do you need me to do? [19:38:18] or anyone with sude [19:38:23] to launch the coordinator [19:38:26] (it's a new job I wrote) [19:38:36] to geo-aggregate projectviews [19:39:34] oh [19:39:44] ok [19:41:29] milimetric: production queue? [19:41:46] start time? [19:42:21] uh... production queue I guess [19:42:26] k [19:42:53] and is the start time ok as in the properties? 05-01? https://github.com/wikimedia/analytics-refinery/blob/master/oozie/projectview/geo/coordinator.properties#L32 or is this a different start time? [19:43:16] usually you overrride start time on CLI, since we don't change properties everytime we relaucnh job [19:43:29] so, you specify, shoudl I use 2015-05-01T00:00Z [19:43:30] ? [19:43:59] oh I see, but that's for old jobs that have already been running for a while, right? Since this is brand new, you don't have to override [19:44:21] ok, so you want it to start in may? [19:44:38] yes, basically for all available projectview data [19:44:57] (it runs on top of projectview_hourly) so it should be super fast anywa [19:44:59] hm, ok, so that is going to take a while to backfill, right? [19:45:01] oh ok [19:45:01] hm [19:45:02] ok [19:45:08] OOooK [19:45:41] started [19:45:42] job: 0141519-150922143436497-oozie-oozi-C [19:45:57] milimetric: run [19:45:57] oozie job -info 0141519-150922143436497-oozie-oozi-C [19:45:59] to see status [19:46:01] or look in hue [19:46:10] sweet :) ! thanks!! [19:50:14] yay, output [20:12:08] heading home, back shortly [20:47:46] Hi! I'm wondering if anyone can help get more insight into some differences we've found in pageviews and CentralNotice banner displays, especially on mobile. Here are some notes: https://collab.wikimedia.org/wiki/Fundraising/2015/Impression_Notes#Correlating_pageviews_and_impressions [20:48:20] ottomata: milimetric: madhuvishy ^ [20:48:24] Thx in advance! [21:07:05] AndyRussG: I'm not sure how to login there [21:07:46] milimetric: ah hehe that could be an issue :( [21:08:07] It's not sekret or anything, just out of habit stuff with FR data tends to get put places like that [21:08:22] np :) [21:11:06] milimetric: lemme see how to get access there... [21:11:14] k [21:14:40] AndyRussG: I was able to log in with (WMF) account [21:15:22] i think they created collab wiki accounts as part of onboarding when i joined [21:31:10] Analytics-Backlog: Gather metrics about cluster usage - https://phabricator.wikimedia.org/T121783#1888675 (Nuria) [21:32:53] madhuvishy: cool! (sorry just finished Standup) any ideas? [21:33:17] milimetric: does your WMF account work there? I could send you the contents of that page otherwise... [21:33:18] AndyRussG: still trying to understand what it is that's different [21:33:36] madhuvishy: look at the results of the queries at the end [21:33:53] my wmf account doesn't work, no, but I got super confused about my passwords and accounts after the recent reset [21:33:56] ah ok looking [21:34:14] On mobile, about 30% discrepancy between pageviews and banner impressions for the segment of users that the fundraiser was targeting [21:34:36] more pageviews or more banner impressions? :) [21:34:39] We do expect some discrepancy, say from unsupported browsers, adblockers or cookie disablers, but not 30% [21:34:40] Yeah [21:34:48] AndyRussG: how is the country code being determined on /beacon/impressions? [21:35:14] madhuvishy: country code is added on to that call by CentralNotice, comes from the GeoIP cookie [21:35:46] That's also what determines whether people get the banners to begin with (the fundraising campaign is geotargeted) [21:35:53] hmmm [21:36:21] i wonder if our country code determination using maxmind database contributes [21:36:52] AndyRussG: also [21:36:55] for mobile [21:37:23] you say in /beacon/impression part of the query that device should be ipad/android etc [21:37:37] and in webrequest - you've filtered for access method mobile web [21:37:47] milimetric: I just e-mailed you the contents of the page... Look at the "Correlating pageviews and impressions" section... [21:38:14] are banners displayed in apps? [21:38:18] * milimetric looks [21:38:29] it sounds like they are very different [21:38:32] madhuvishy: not in apps [21:38:44] madhuvishy: also, only on ipad, android and iphone devices [21:38:58] AndyRussG: it's possible that 'mobile web' includes more [21:39:08] we also get bot traffic in mobile web [21:39:12] That is one factor, but we've determined it's not sufficient [21:39:15] See the table [21:39:25] that are not accounted for in 'spider' or 'bot' [21:39:42] It goes through all the factors that could contribute to the difference, and guesstimates how much each could contribute [21:40:09] Still at the highest guestimate, it's 18% max guesstimate-expected difference, vs. 30%, or 35% in the US [21:40:30] So we're worried there's some bug, or something impeding pageviews that we could fix [21:40:54] AndyRussG: are you saying bots only contribute to 2-5% max of the difference? [21:41:09] what is that based on? [21:42:50] AndyRussG: but wouldn't it make sense that people are just closing it and it doesn't get shown again for the remainder of their pageviews? [21:42:53] that's what I do [21:42:57] AndyRussG: can you move that content to wikitech so we all can see it? [21:45:35] madhuvishy: yeah originally I was estimating a higher percentage, but then I found this https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_access_solution/BotResearch#Bots [21:47:36] milimetric: closed banners also cause a call to /becaon/impression. It's one of the ways that we check the system is working as expected [21:48:28] nuria: very sadly, I'd have to ask permission w/ FR because FR data :( It might be OK, in this particular case [21:48:32] not sure [21:50:53] AndyRussG: have you tried running the mobile query without the device clause [21:52:03] or is that the only way to restrict for mobile with /beacon/impression? [21:53:22] madhuvishy: I think it could aslo be restricted to mobile using a field in the table. But the numbers should be identical [21:53:54] AndyRussG: as in use webrequest mobile web? [21:54:11] (access method='mobile web') [21:54:14] ejegg: ^ [21:54:18] Yeah I think [21:55:25] * ejegg reads backscroll [21:56:05] oh, we can filter that by mobile web too? [21:56:12] madhuvishy: maybe access_method='mobile web'? [21:56:27] Cool, anything to get the filters to match between the datasets is good [21:57:05] Analytics-Backlog, Services, Patch-For-Review: wikimedia.org/api and wikimedia.org/api/rest_v1 should redirect to the docs - https://phabricator.wikimedia.org/T118519#1888751 (Krinkle) Yeah, RESTBase serves on wikimedia.org, not www.wikimedia.org. Which in itself is also confusing since the main portal... [21:57:22] AndyRussG: is the banner only being shown on en.wikipedia? [21:57:44] madhuvishy: for this campaign, yes [21:57:46] madhuvishy: yep, we call the December campaign Big English [21:58:43] to be more precise, we should filter on that for beacon/impression too, though [22:00:14] AndyRussG: yeah - you can filter uri_host='en.m.wikipedia.org' [22:04:14] madhuvishy: substituting device= for mobile web yields very similar numbers [22:04:36] ya i saw that [22:07:26] AndyRussG: give me a minute - checking something else [22:08:37] K! [22:08:55] Another difference that we might be able to check somehow is logged in vs. not logged in users [22:09:25] If we could somehow get the true # of logged-in pageviews for that segment and time period... [22:11:29] AndyRussG: true [22:11:33] another thing could be this [22:11:35] I ran [22:11:38] https://www.irccloud.com/pastebin/RSdg1muC/ [22:11:44] madhuvishy: milimetric: nuria: what would be the best way to grab a bunch of queries like these for a series of time preiods and graph them? I think there's some python thing we can use, and I've seen Ellery uses a python notebook a lot [22:12:14] not a very fancy query - but basically - what your geo ip determines as country code and what maxmind claims the country code is [22:12:45] I can see that there are differences - but not sure if it's significant in anyway [22:13:02] hmmm [22:15:27] AndyRussG: Made it slightly more readable with better names [22:15:30] https://www.irccloud.com/pastebin/4LtAEyt2/ [22:16:11] Getting a Exception in thread "main" java.lang.IllegalArgumentException: !=country_code: event not found [22:17:26] weird can you try my recent paste? [22:18:36] AndyRussG: i can email you the results if not [22:18:54] sure... [22:19:08] yeah just tried it again, I'm just pasting it into the hive console [22:19:24] Maybe some funny invisible character getting in there somehow [22:19:27] madhuvishy: that can't work 'cause you can't reference column aliases in the having right? [22:19:40] milimetric: you can't in the where [22:19:43] but you can in having [22:19:55] k, weird it's failing then [22:20:14] ya i'm copying from hue may be some weird character [22:20:17] let me try hive [22:21:29] Hmm just tried manually typing in just that line, still same error [22:21:34] hue? [22:21:54] AndyRussG: yeah - hue.wikimedia.org [22:23:12] AndyRussG: found it [22:23:21] i think if you do <> instead of != [22:23:24] it'll work [22:23:33] i sent results by email anyway [22:23:47] they are equivalent - but hive cli throws an error for != [22:23:56] milimetric: ^ [22:24:09] ha [22:26:50] madhuvishy: hmm interesting. I didn't know about those different sources of geo data [22:27:38] I don't think it's enough to account for the differences we're seeing tho [22:28:10] AndyRussG: you can graph with matplotlib in python for ad hoc results [22:28:31] nuria: ah cool! Is that what you'd recommend? [22:29:18] milimetric: madhuvishy: nuria: one hypothesis for now is that there's some javascript error that only occurs under bad-ish network conditions (something the executes in a different order, or something like that) so we don't see it in any of our tests, but is affecting a % of users [22:29:27] AndyRussG: yeah i'm not sure either [22:29:29] aah [22:29:55] For that hypothesis, breaking the data down by region might be interesting [22:30:10] Another possibility is that there are a _lot_ more bots than we think [22:30:21] AndyRussG: yeah [22:30:22] That might jive with the difference being higher in the US [22:30:36] i would believe it's higher too [22:30:42] If there were some way to query pageviews only for browsers running JS [22:30:43] AndyRussG: i have not looked at your data yet but low bandwith conditions and js errors does not need to be a hypothesis, you can test taht and see for yourself. [22:30:54] i'm not super sure about the 2-5%. nuria was that your estimate? [22:30:54] AndyRussG: specially with latest chrome and network condituoner [22:30:54] nuria: yes true! [22:31:30] AndyRussG: breaking by country code will not help you with js errors, very much doubt it [22:31:57] madhuvishy: estimate for non js enabled browsers? [22:32:02] Well just to see if the difference is greater in rural regions (where presumabily connectivity is worse) [22:32:16] AndyRussG: so to get a timeseries of data, I'd suggest just grouping by hour or day and outputting a TSV. You can graph that with a million different tools, I personally prefer dygraphs for something quick [22:32:18] nuria: no percentage of bot traffic in an hour on en wiki mobile [22:32:25] for agent type user [22:32:27] madhuvishy: or bot traffic tagged (wrongly) as users? [22:32:40] yes that [22:33:23] Also we know that some users who fully disable cookies are getting a JS error [22:33:32] madhuvishy, AndyRussG percentage of bots not tagged as such (lumping all projects together) is 1.7% of our pageviews on desktop [22:33:49] nuria: that seems small [22:33:52] madhuvishy: but it did not look at mobile which might be higher (easier to crawl) [22:33:58] madhuvishy: I queried a different way to get the % of requests with mis-identified countries: [22:34:01] https://www.irccloud.com/pastebin/3JdGWnXU/ [22:34:14] madhuvishy: "at least" ~2% [22:34:15] ...that's in the pipeline to be fixed, but it's not a hughe amount [22:34:18] it's never over 0.1% or so [22:34:23] (the cookie thing ^) [22:34:30] madhuvishy: as in that is the minimun of "user" traffic that are bots [22:34:51] nuria: do you have a sense on what could be the upper bound? [22:35:10] because I was thinking maybe if some requests get identified incorrectly, then all requests from that IP would as well, and maybe that would explain it [22:35:15] AndyRussG: if you want a fast checkup of your data exclude all traffic with x-analytics['nocookie']=1 [22:35:17] milimetric: cool :) [22:36:17] ok, AndyRussG so I don't think I can be much help right now, but this should be priority #1 at the foundation if we think there's something wrong [22:36:26] cc madhuvishy milimetric , i think removing all traffic with nocookie=1 will remove the majority of not user traffic [22:36:27] so all engineers should drop what they're doing and try to help [22:36:49] nuria: interesting [22:37:14] nuria: yeah we can only do that in webrequest though - hmmm [22:37:57] AndyRussG: look at your numbers with that and see, it will remove "some" user traffic but mostly bots not identified as such, the 2% i gave you is the bottom line estimate, our bot traffic -not identified as such- might be one order of magnitude higher [22:37:58] https://wikitech.wikimedia.org/wiki/X-Analytics [22:37:58] milimetric: hmm I don't think it's that drastic?! Our suspiscion so far is that it's probably OK in fact, and we're just not fully aware of all the factors that are expected and unavoidable [22:38:22] not showing banners to 30% of people we think we're showing it to is not drastic? [22:38:44] I mean, I agree there's likely some explanation [22:38:44] brb [22:38:59] but the possibility of it being true makes it fairly urgent imho [22:39:17] milimetric: it's definitely not 30%. There's definitely pageviews in there that we don't and can't show banners to [22:39:42] Yes though still I think you're right [22:39:47] well, whatever % it is, it would mean our fundraiser would have to last that % longer theoretically [22:39:54] nuria: how is the X-Analytics related? [22:40:52] milimetric: now I feel guilt that in about 5 minutes I have to go drive my family to the store! I'll be back online from the parking lot, and then again later this evening [22:40:58] nuria: madhuvishy ^ [22:41:38] AndyRussG: I'll be around until 5.30 pacific [22:41:46] I'll keep thinking about your query but I'm suuper slow when I think about sql [22:42:36] madhuvishy: cool thx [22:42:36] milimetric I know the feeling, me too! [22:43:10] milimetric: madhuvishy: nuria: K thanks so much for the help so far, talk to again in a bit!!! [23:17:03] nuria / madhuvishy: I think I figured AndyRussG's problem for when he's back [23:17:08] he uses this clause: AND page_title not RLIKE ':' [23:17:30] which didn't make sense to me so I queried for pages in pageview_hourly with page_title RLIKE ':' [23:17:54] and there are tons of normal looking pages, like over 100k [23:18:16] oh wait! that means the problem would be even worse :) [23:19:20] milimetric: let me look at it for a sec, was taking a break [23:23:00] milimetric: ya [23:23:05] i was looking into this [23:23:19] RLIKE doesn't work like we expect it to [23:23:31] i use REGEXP_EXTRACT and check always [23:23:46] i was just gonna check with that [23:26:01] milimetric: what is the reason for RLIKE ':' [23:26:23] no idea [23:26:47] ok let me test with REGEXP_EXTRACT - to the best of my knowledge - RLIKE is broken [23:26:48] but! without it, the % matching in the US drops from 79 to 74 [23:27:06] RLIKE is totally unintuitive at best and probably broken :) [23:29:07] you know what, madhuvishy, I don't have time to try this right now [23:29:15] but what I'd do is group by page_title [23:29:42] and try to find if the mismatches are spread evenly across all pages or just coming from a problem with the pageview definition or something [23:30:03] I'll be back and try it soon if you don't get to it [23:30:11] milimetric: okay sure [23:35:48] madhuvishy: if i am reading results correctly difference is no bigger than 5% other than in the us [23:35:58] madhuvishy: which is almost 20% right? [23:36:20] madhuvishy: which makes sense as bot % is a lot higher [23:38:30] milimetric: madhuvishy: nuria: BTW forgot to mention here's a task for notes about this: https://phabricator.wikimedia.org/T121042 [23:38:56] madhuvishy: ah no wait, [23:39:16] madhuvishy: it is all english wikipedia [23:40:53] AndyRussG: what is the rationale for the RLIKE ':' [23:41:05] rather excluding those with ':' [23:41:11] madhuvishy: that's because the banners are only showing on the main namespace [23:41:14] Ah I don't know [23:41:34] AndyRussG: why do you need to join pageview hourly and webrequest as requests on webrequest are also tagged with is_pageview=1 [23:41:35] I don't know Hive SQL (or whatever it's called?) very well, that was suggested by ellery [23:41:45] Ah ellery hi! I see you're here on IRC :) [23:41:55] nuria: ya only the page_title field is missing [23:42:19] which they are using to do page_title not RLIKE ':' which i'm trying to understand why [23:42:28] madhuvishy: the page title only used on RLIKe i see, [23:42:53] madhuvishy: but * I think* pageviews only count main namespace, do they? let me see [23:43:17] AndyRussG: [23:43:35] something that's pageview RLIKE ':' could be [23:43:46] https://www.irccloud.com/pastebin/Awzca88Z/ [23:43:48] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1889296 (Milimetric) >>! In T112956#1886408, @RobLa-WMF wrote: > First, my apologies for being so grumpy in my last response. I think there is likely some very... [23:43:58] AndyRussG: I don't see why you'd exclude all of these [23:45:02] madhuvishy: well it's just a side effect of the hack for excluding non-main-namespace pages [23:45:17] community requested FR banners only show on the main namespace [23:45:29] AndyRussG: are non main namespaces pages counted on pageviews? [23:45:51] nuria: hmmm that's a good question... I didn't imagine they wouldn't be.... ? [23:45:53] what's a sample of something not on the main namespace? [23:46:15] madhuvishy: https://en.wikipedia.org/wiki/Talk:Main_Page [23:46:20] AndyRussG: I think you need to double check somethings with ellery to understand why that code is there [23:46:52] The namespaces are prefixes and a colon before the article name. There are a lot of 'em [23:47:16] AndyRussG, madhuvishy : ok, pageview definition excludes some Special: Pages but other than those looks like the majority are counted [23:47:18] Right. I think they do get counted as pageviews [23:47:34] nuria: the intention was the non-main namespace exclusion, but I don't know if there were other possible ways of doing that [23:48:15] nuria: madhuvishy: https://meta.wikimedia.org/wiki/Research:Page_view [23:49:34] Another thing we could correlate with is calls to Special:BannerLoader, which actually loads banner content [23:49:47] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1889326 (Legoktm) >>! In T119593#1883179, @Milimetric wrote: > @Legoktm: >> There are 50+ tasks I tried to read, so I skimmed a good number of them, apologies... [23:50:05] That would let us eliminate JS errors happening between when the banner is requested and /beacon/impression is called [23:51:21] AndyRussG: as i said , i do not think you need to theorize JS errors, you are never going to find out whether they were happening with this data analysis, it will be a lot better to actually test for those changing your connection speed. [23:51:49] nuria: yes for connection-speed related ones [23:52:04] and yes gotta do that [23:52:51] AndyRussG: to sumarize, your main concern are differences in mobile and us, correct? [23:53:00] Analytics-Backlog, Research management, Research-and-Data: Pipeline for data-intensive applications from research to productization to integration - https://phabricator.wikimedia.org/T105815#1889338 (DarTar) @halfak we need to include Swagger support to that list before we wikify it. [23:53:04] AndyRussG: but how are you accounting fo [23:53:44] nuria yes, though if we get more precise numbers and find the discrepancy is also greater than it should be for other regions and/or desktop, that would also be a concern [23:53:50] AndyRussG: but your code doesn't discount in nay way repeated visits for which the banner is not shown again [23:54:02] AndyRussG: how is that taken into account? [23:54:12] nuria: he mentioned that banner is hidden [23:54:13] nuria: /beacon/impression is called in those cases too [23:54:17] but requests are made [23:54:33] AndyRussG: ah, very sorry, i missed that [23:54:42] Even if the banner is hidden, be it by CentralNotice or by code in the banner itself (both are possible), /beacon/impression should be called [23:54:50] nuria: np! it's a convoluted system [23:55:23] AndyRussG: did you look at mobile browsers for those countries to get an idea of how many donot support js [23:55:23] nuria: madhuvishy: milimetric: sorry gotta drive again!! Thanks so much for bearing with me!!!!!!!!!!! [23:55:34] Yes [23:55:41] AndyRussG: you need to do that to discount those numbers [23:55:43] But it could be checked out more thoroughly [23:56:03] It added up to about 4% of pageviews as far as we saw [23:56:08] back in a bit! thx again!!!