[00:14:59] halfak: any idea why http://gp.wmflabs.org/graphs/active_editors_total shows the spike but http://reportcard.wmflabs.org/graphs/active_editors doesn't? [01:34:11] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1439939 (Catrope) On the backend, we should see zero INM/304 traffic for versioned (long cache) requests, and very little 200 traffic and only... [02:02:57] neilpquinn, sorry, not sure how those graphs are generated. [02:04:10] No worries! It seems like the Wikistats/Report Card numbers are more reliable anyway. [12:02:32] neilpquinn: that graph is based on data that's not as carefully refined as the wikistats ones. This note talks about that a bit (underneath the graph): "This graph currently over-reports by counting each active editor once for each distinct pair of project and country associated to the IP addresses used by the editor." In contrast, wikistats carefully [12:02:32] de-duplicates across projects. [12:49:00] Ironholds: Hi ! [12:59:14] halfak: Hi :) [12:59:23] o/ joal [12:59:46] halfak: I have forgotten to ask you [12:59:52] joal, we're in the reduce stage of that diff job :) [13:00:02] halfak: awesome [13:00:15] Started long ago (reduce) ? [13:00:34] at about 1200 UTC today [13:00:40] So, not long [13:00:43] halfak: Very interesting presentations sent by Aravind [13:00:57] Re-snappy splits? [13:01:02] yup [13:01:29] I am not sure about why, but snappy is not splittable when using text [13:01:52] So I'll change the json converter to use bz2 [13:01:56] bz2 for the future! [13:01:59] :D [13:01:59] :) [13:02:13] I also have a request for you [13:02:38] Do you mind having a look at https://phabricator.wikimedia.org/T102161 [13:02:43] halfak: --^ [13:03:07] Vetting new pageview aggregation [13:33:35] Analytics-Cluster, Analytics-Kanban: Generate test data for Pageview API {slug} [5 pts] - https://phabricator.wikimedia.org/T101785#1441180 (JAllemandou) Distribution analysis for both page_title_hourly and project_cube {F190564} {F190565} {F190581} # page_title_hourly - ~60% of page_titles h... [13:34:10] halfak: yt ? [13:34:29] Sorry. Meeting! [13:39:44] halfak: np :) [13:47:22] Ironholds: Hi again :) [14:18:09] joal: so the aggregations you just merged from marcel should / would end up in https://metrics.wmflabs.org/static/public/datafiles/Pageviews/all.csv when deployed, right? [14:18:32] milimetric: correct ! [14:18:40] cool, thx, any ETA on that? [14:18:42] jgage: Hi Sir [14:18:47] are you waiting for anything to deploy? [14:19:02] Depends on jgage --> need a puppet merger [14:20:04] so if it's not today, will be early next week with Andrew [14:20:08] milimetric: --^ [14:20:14] I'm off tomorrow [14:20:27] cool, makes sense [14:20:28] thx [14:20:32] np [14:20:46] milimetric: interesting results about page_title viewcounts [14:21:00] https://phabricator.wikimedia.org/T101785#1441180 [14:21:16] I'll read that in a sec, about to push a patch [14:21:27] milimetric: np [14:22:57] Analytics-Backlog: Sanitize aggregated data presented in VitalSign using K-Anonymity {musk} [8 pts] - https://phabricator.wikimedia.org/T104485#1441340 (JAllemandou) Analysis made here : https://phabricator.wikimedia.org/T101785#1441180 [14:33:27] (PS1) Milimetric: Handle all.csv whether or not it exists [analytics/dashiki] - https://gerrit.wikimedia.org/r/223789 (https://phabricator.wikimedia.org/T95340) [14:40:42] joal, just finished reviewing that phab task. [14:40:46] All looks reasonable to me. [14:41:17] I know that Ironholds did some work to vet this in the past, so I expect he'll have some suggestions on things to look for. [14:41:41] what did I do? [14:41:44] it wasn't me, someone else did it [14:41:51] :D [14:41:59] hey dude. See https://phabricator.wikimedia.org/T102161 [14:42:07] I think all you need to do is read and comment :) [14:42:15] "need" [14:42:29] NEED URGENT OMG ;) [14:42:43] Speaking of urgent [14:42:49] * halfak gets back to his most recent urgent work [14:43:25] read it! [14:43:29] looks good! URL decoding is vital [14:43:59] +1 [14:44:50] halfak, Ironholds : Thanks both of you ! [14:45:31] halfak: I promise to tag tasks as urgent next time ;-P [14:45:32] thank you for doing the work! :D [14:45:40] I'm totally out of the loop; no access to anything for two weeks. bleh [14:46:06] Ironholds: no problemo, I just wanted to have that confimred before wikimania :) [14:47:26] gotcha! [14:47:35] super excited by how far people got while I was away :D [14:52:30] (PS1) Milimetric: Add overall aggregate to project and language list [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/223793 (https://phabricator.wikimedia.org/T95340) [15:00:41] Ironholds: while you're at it, can you have a look at k-anonymisation issue: https://phabricator.wikimedia.org/T101785 [15:01:47] joal, will do! [15:02:01] Ironholds: Thx ! Will you be at Wikimania ? [15:03:37] nope! [15:03:47] halfak, etherpad link? I wanna give a report :D [15:04:08] huh? [15:04:25] Oh yeah. The haiku lacked a link! [15:04:52] http://etherpad.wikimedia.org/p/WMF_Research_Group [15:19:30] (PS2) Milimetric: Add overall aggregate to project and language list [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/223793 (https://phabricator.wikimedia.org/T95340) [15:37:39] Hey jgage, let me know when you're around :) [16:43:07] hi madhuvishy. are we using persistent cookies or session cookies for lastAccess? [16:52:38] leila: I think it's persistent over a 31 day period [16:55:03] joal: good morning! [16:55:23] Hey jgage :) [16:55:24] got it. thanks madhuvishy. looking at counting options, while you're pushing on the other fronts. [16:55:29] Good morning ! [16:56:04] leila: awesome, thanks [16:56:18] jgage: Would you have some time for (really easy) review ? [16:56:22] joal: ok, i see the change from mforns and i shall merge it [16:56:33] jgage: You rock :) [16:57:16] joal: what host will this be applied to? i'll run puppet there after i merge. [16:57:41] jgage: stat1002 [16:57:48] ok, one moment please [16:57:51] :) [16:59:56] joal: ok, merged and applied :) [17:00:12] Thanks a lot jgage :) [17:00:16] my pleasure [17:00:44] Have a good day :) [17:00:59] thanks! you have a nice.. evening ;) [17:01:23] thanx ;) [17:03:27] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1442048 (Ironholds) Okay! So: in my mind throwing it in HDFS is not the best solution. If we want to throw /all/ eventlogging data in HDFS, great. If we want to throw /no/ eventloggi... [17:25:59] lzia: Hi leila [17:26:18] helloo joal. What's up? :-) [17:26:52] I just realized I didn't let you know I won't make it tonight to the checkpoint, and that Andrew's in holidays [17:27:17] Sooooo, I guess it should probably be cancelled [17:27:26] leila: --^ [17:27:46] np. thanks for the heads up, we can talk more in Mexico. [17:29:07] leila: for sure, looking forward to that :) [17:29:12] Have a good day "! [17:31:42] (CR) Joal: "Except for the typo in desccription, looks good to me :)" (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/223793 (https://phabricator.wikimedia.org/T95340) (owner: Milimetric) [17:33:43] mforns: you there ? [17:33:45] mforns: retro [17:34:59] (PS3) Milimetric: Add overall aggregate to project and language list [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/223793 (https://phabricator.wikimedia.org/T95340) [18:01:44] joal: I fixed that typo btw ^ [18:03:47] milimetric: Actually it was not a typo, but a misunderstanding from me :) [18:04:02] milimetric: I appove / merge and let you test ? [18:04:26] it was a typo, I had "languages" instead of "Languages" [18:04:53] You should have all.csv file tomorrow (13:00 UTC) [18:04:58] milimetric: --^ [18:05:26] sweet, ok, so I'll deploy this at some point tomorrow [18:05:43] I just need the dashiki change merged too [18:06:06] (CR) Joal: [C: 2 V: 2] "LGTM :)" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/223793 (https://phabricator.wikimedia.org/T95340) (owner: Milimetric) [18:06:16] You are merged milimetric :) [18:06:38] mforns: just realized that we missed some interesting change in aggregator --> Logging [18:06:44] next change ;) [18:16:36] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1442344 (Krinkle) Resolved>Open [18:33:26] joal, oh so aggregator's logging was improved? [18:35:27] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1442438 (Milimetric) @Ironholds, regarding the higher but not as high sampling, how long do you need this data for? That will affect what sampling you can do because the limiting fac... [18:38:43] Analytics-Cluster, Analytics-Kanban: Generate test data for Pageview API {slug} [5 pts] - https://phabricator.wikimedia.org/T101785#1442454 (Milimetric) I'm not sure we're making the right choice here to k-anonymize. I think even if we chose a huge K we would still be vulnerable to the problems that l-di... [18:45:42] mforns_brb: I was thinking of adding logging for all projects ;) [18:46:02] joal, aha [18:46:10] Anyway, we'll talk about this on monday ; [18:46:18] I'm off for today :) [18:46:25] ok! see you then [18:46:29] See you guys on Monday, in MEXICO :) [18:46:31] :] [18:46:32] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1442492 (Ironholds) We're setting a 90 day restriction around other tables right now and that seems totally viable to me (we could even go for less, if Other Dan is okay with being ab... [18:46:47] ándale! [18:46:56] :D [19:19:29] :) safe travels joseph, see you soon [19:21:12] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1442588 (Milimetric) Yeah, 90 days is where we'd like all our EL tables to be. Less than that would increase the sampling that we could potentially do here. I'm thinking if the tabl... [19:22:11] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1442589 (Ironholds) I'm not :( [19:44:00] o/ neilpquinn [19:55:12] halfak: o/ What's up? [19:56:47] Just getting some preliminary results together for the TAE spike of doom. [19:57:24] It looks like there was a burst in new registrations for enwiki that started on 2014/07 [19:58:37] And that this resulted in an increased count of new actives that bleeds into surviving new actives, old actives and reactivateds [19:58:59] I'll have notes on this soon. Was wondering if you were seeing similar things. [19:59:42] No, I haven't done any analysis that would've touched that. Very interesting though Does it look like that's the bulk of the spike? Or just one factor? [20:00:42] neilpquinn, for enwiki, the bulk. I'm splitting the languages into large/medium/small now to take a look at groups. [20:01:09] I've got to run right now. Will make notes and ping you with them later. [20:01:14] Cool, thanks! [20:17:43] madhuvishy, do you know if kevinator is at the office? [20:18:02] mforns: he is, but not at his desk [20:18:08] not sure where he went [20:18:27] oh wait, i think i do [20:18:29] madhuvishy, ok, if you see him, can you please tell him I'm looking for him? :] [20:18:37] mforns: sure :) [20:18:41] thanks! [20:31:34] mforns: around? [20:34:37] madhuvishy, hi! [20:34:53] Kevin's looking for you now :) [20:34:54] mforns: hi, i heard you were looking from me [20:43:21] milimetric: i am looking at your wikimetrics changes - and the only thing that's a bit non-obvious to me is the Include deleted checkbox [20:43:35] like when i dont select it but shows up selected as a default [20:44:11] i understand that it says if you dont select something it will set wikimetrics' defaults, but its still a little confusing [20:53:27] madhuvishy, thanks for pinging kevi-nator :] [21:04:13] Analytics-Kanban: Troubleshoot EventLogging validation alerts - https://phabricator.wikimedia.org/T105167#1443023 (mforns) By listing all alert timestamps and durations I could not find any matching anomaly in graphite, the logs or the database. Dan gave the idea that maybe icinga is getting semi-up-to-date... [21:05:30] mforns: no problem :) [23:02:01] tnegrin: I'm in a Hangout that no one is in [23:02:05] am I in the wrong place? [23:02:14] we're held up [23:02:16] be there soon [23:02:28] ok. thanks. [23:03:04] 10 minutes [23:03:11] we were just in an epic meeting [23:51:18] neilpquinn, check out https://meta.wikimedia.org/wiki/Research_talk:Active_editor_spike_2015/Work_log/2015-07-09 [23:51:33] I have more work done, but I don't have time to write about it now. [23:51:38] I'll get back to it in the morning.