[00:30:33] Analytics-EventLogging, Analytics-Kanban: Some blacklist matching schemas are being consumed by Eventlogging {oryx} - https://phabricator.wikimedia.org/T126410#2017387 (madhuvishy) a:madhuvishy [00:30:55] Analytics-Kanban: Prepare for lightning talk on Last access uniques - https://phabricator.wikimedia.org/T126411#2017390 (madhuvishy) [04:07:51] Analytics-Wikistats: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2017776 (Tbayer) NEW a:ezachte [08:01:24] https://phabricator.wikimedia.org/T116740 [08:15:51] Analytics-Kanban, Editing-Analysis, Patch-For-Review, WMF-deploy-2016-02-02_(1.27.0-wmf.12), WMF-deploy-2016-02-09_(1.27.0-wmf.13): Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2017972 (jcrespo)... [08:16:09] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, and 2 others: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2017974 (jcrespo) [10:16:57] (PS1) Joal: Correct last_access_uniques jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/269938 [10:24:19] joal: hi! [10:24:36] Hi dcausse :) [10:24:40] is there a way to inspect the status of our oozie job [10:25:02] it properly backfilled most of the partitions but not all of them [10:25:11] For sure, but I am currently trying to do so for some ours, and oozie is VERY unresponsive [10:25:20] ah ok [10:25:25] maybe the reason [10:25:28] :) [10:26:24] I'm off normally today, and will leave soon, but best is to wait for oozie to catch up, and if it doesn't, ask ottomata when he comes online to restart it [10:26:28] dcausse: --^ [10:26:43] joal: oh sorry, sure no problem :) [10:26:48] dcausse: We have on our plan for this quarter to upgrade this oozie/hue server [10:26:59] dcausse: no prob, I'm online now ;) [10:27:05] ok :) [10:27:13] dcausse: Hopefully it'll help [10:27:33] dcausse: And even if I wasn't off, I don't have rights to restart oozie ;0 [10:27:40] ok :) [11:18:28] joal: sorry just seen the messages :( [11:18:48] I need to learn how to do these things :) [12:18:38] Analytics-Tech-community-metrics, DevRel-February-2016: Data in korma project pages has confusing labels, is difficult to understand - https://phabricator.wikimedia.org/T110524#2018351 (Qgil) >>! In T110524#2009926, @Aklapper wrote: > Proposing to close this task somewhere between declined and resolved.... [12:57:53] Analytics-Tech-community-metrics, DevRel-February-2016: Data in korma project pages has confusing labels, is difficult to understand - https://phabricator.wikimedia.org/T110524#2018448 (Aklapper) Open>declined [14:39:33] good mornniiing [14:39:39] joal: here's this, https://gerrit.wikimedia.org/r/#/c/269759/ not sure if you saw it [14:47:11] ottomata: o/ [14:49:16] hii [14:52:37] There were some X in the report send from stat1002 this morning [14:53:15] || 2016-02-11T07/1H || . | . | . | . || _ | _ | X | X || [14:53:34] probably nothing, but I didn't really know how to check :( [15:07:40] elukey: I'd ignore for now, its at the most recent time in the report [15:07:57] but! [15:08:00] we can double check let's seee [15:08:10] you can run the status report manually [15:08:13] on stat1002 [15:08:31] /srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions [15:08:50] /srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions --datasets webrequest [15:08:59] and ya [15:09:01] is totally fine now [15:09:02] || 2016-02-11T07/1H || _ | _ | _ | _ || [15:11:01] yep but I'd need to start figuring out how to debug jobs taking too long, etc.. [15:11:17] I'll study a bit next week :) [15:41:47] (PS4) Mforns: [WIP] Add hierarchy visualization [analytics/dashiki] - https://gerrit.wikimedia.org/r/269713 (https://phabricator.wikimedia.org/T124296) [16:23:06] (PS5) Mforns: [WIP] Add hierarchy visualization [analytics/dashiki] - https://gerrit.wikimedia.org/r/269713 (https://phabricator.wikimedia.org/T124296) [16:23:37] (PS6) Mforns: [WIP] Add hierarchy visualization [analytics/dashiki] - https://gerrit.wikimedia.org/r/269713 (https://phabricator.wikimedia.org/T124296) [16:28:21] (PS7) Mforns: Add hierarchy visualization [analytics/dashiki] - https://gerrit.wikimedia.org/r/269713 (https://phabricator.wikimedia.org/T124296) [16:30:29] a-team: just sent the e-scrum, won't join the standup! [16:31:52] am I in the batcave? [16:31:54] ees working? [16:31:55] can't tell. [16:32:17] madhuvishy: ? [16:33:19] Analytics-Kanban: Fix mediawiki Avro camus import after last weeks deploy [8 pts] - https://phabricator.wikimedia.org/T126622#2018918 (Ottomata) NEW a:Ottomata [16:51:46] mforns: oh yeah, so how do I configure the graph for hierarchy? is there a metric on datasets.wikimedia.org that works? [16:52:03] (CR) Jcrespo: Update hostnames to analytics-store (1 comment) [analytics/geowiki] - https://gerrit.wikimedia.org/r/265213 (owner: Milimetric) [17:17:58] Analytics: Polish script that checks eventlogging lag to use it for alarming - https://phabricator.wikimedia.org/T124306#2019107 (elukey) [17:36:13] milimetric, sorry I was afk [17:36:32] mmm I don't know [17:37:38] the columns should be: date, categoryLevel1, categoryLevel2, ..., categoryLevelN, count [17:37:47] let me look for some one that works [17:56:14] milimetric, I didn't find any. I've put the example I used in stat1003:/home/mforns/hierarchyExample.tsv [18:00:01] k, mforns, I'll push it to http://datasets.wikimedia.org/limn-public-data/metrics/request-breakdowns/os-browser/ so we can have it public [18:00:36] milimetric, it is incomplete though, it's not real data, it's just a slice of that [18:00:45] i know, it's fine [18:00:48] ok [18:00:52] a-team: grooming? [18:01:10] milimetric: batcave? [18:01:15] batcave or grooming? [18:01:17] or the meeting in the calendar? [18:01:20] dammit WHY [18:01:21] lol [18:01:29] yes, batcave [18:01:30] ok [18:01:31] ok [18:02:38] milimetric: sorry I have to skip the meeting, still finishing the upgrades :( [18:03:18] Analytics: Polish script that checks eventlogging lag to use it for alarming - https://phabricator.wikimedia.org/T124306#2019283 (Milimetric) p:Triage>Normal [18:03:43] Analytics-Tech-community-metrics, Developer-Relations, DevRel-February-2016: Mark BayesianFilter repository as inactive - https://phabricator.wikimedia.org/T118460#2019285 (Aklapper) a:Aklapper [18:03:50] Analytics-Tech-community-metrics, Developer-Relations, DevRel-February-2016: Mark BayesianFilter repository as inactive - https://phabricator.wikimedia.org/T118460#1800890 (Aklapper) * No patch left with open status in Gerrit. * Edited the Gerrit desc to mention it's inactive: https://gerrit.wikimedia.... [18:04:43] elukey: np! [18:05:40] Analytics, Discovery: Look into encrypting logs sent between mediawiki app servers and kafka - https://phabricator.wikimedia.org/T126494#2019302 (Milimetric) p:Triage>Normal [18:07:19] Analytics: Research Spike: Provide data on top pageviews across all projects daily - https://phabricator.wikimedia.org/T126367#2019307 (Milimetric) [18:10:22] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2019404 (Milimetric) p:Triage>Normal [18:12:36] Analytics: Consider SSTable bulk loading for AQS imports - https://phabricator.wikimedia.org/T126243#2019417 (Milimetric) p:Triage>Normal [18:13:51] Analytics: Standardize Hive UDF code comments and generate documentation {flea} - https://phabricator.wikimedia.org/T114238#2019424 (Milimetric) p:Triage>Low [18:16:57] Analytics-Backlog, Analytics-EventLogging, Traffic, operations: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#2019446 (Milimetric) [18:17:00] Analytics, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#2019444 (Milimetric) Open>declined We decided to not split and merge events, this would be... [18:18:18] Analytics, Analytics-EventLogging: Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2019456 (Milimetric) Hey @Legoktm, do you know if anyone is doing this or should we try? [18:18:26] Analytics, Analytics-EventLogging: Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2019461 (Milimetric) p:Triage>Normal [18:19:23] Analytics, Quarry: Build an internal Quarry instance to share data & sample queries between researchers (and other analytics users?) - https://phabricator.wikimedia.org/T75142#2019467 (Milimetric) @YuviPanda have you changed your mind about this in favor of jupyter goodness? [18:21:21] Analytics-Tech-community-metrics, Developer-Relations, DevRel-February-2016: Mark BayesianFilter repository as inactive - https://phabricator.wikimedia.org/T118460#2019481 (Aklapper) p:Low>Lowest [18:21:51] Analytics, Analytics-Cluster, MediaWiki-Vagrant: vagrant role that installs kafka in vagrant so users can test monolog in mediawiki - https://phabricator.wikimedia.org/T115703#2019485 (Milimetric) Open>Resolved a:Milimetric Already done By @ebernhardson or David C. in discovery, in https://g... [18:23:31] Analytics: Add scala 2.10.4 to stat1002 and analytics1027 - https://phabricator.wikimedia.org/T115970#2019499 (Milimetric) p:Triage>Normal [18:24:55] Analytics: Write a script to automatically run dependent jobs in refinery - https://phabricator.wikimedia.org/T115985#2019509 (Milimetric) p:Triage>Normal [18:27:54] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, and 2 others: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2019532 (Neil_P._Quinn_WMF) @jcrespo, don't worry! I meant "we're ready [for yo... [18:28:39] Analytics, Analytics-Cluster: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#2019535 (Milimetric) p:High>Normal [18:37:36] mforns: check out http://localhost:5000/src/layouts/tabs/#hierarchy-example [18:38:11] looks sweet, I'm gonna play with it though [18:45:15] milimetric, cool, what about the new colors? [18:51:44] (CR) Nuria: "Looks good, I think now is also more clear how do the offsets work. Thanks for doing these changes as it took some detective work" [analytics/refinery] - https://gerrit.wikimedia.org/r/269938 (owner: Joal) [18:54:36] (CR) Nuria: "Also, this computation is slightly longer than the one we were doing prior, I think you have already run the queries to make sure performa" [analytics/refinery] - https://gerrit.wikimedia.org/r/269938 (owner: Joal) [18:56:04] a-team: logging off, just finished the last servers! [18:56:10] talk with you tomorrow! [18:57:30] joal: changes to last access plus percentages look good, pufff... some work those were [18:57:38] joal: do queries take a lot longer [18:59:27] Analytics, Discovery: Look into encrypting logs sent between mediawiki app servers and kafka - https://phabricator.wikimedia.org/T126494#2019720 (EBernhardson) The current volume for only CirrusSearch logging is a peak of ~3k/s. This might increase greatly when the API team starts using the same code path... [19:06:20] (PS8) Mforns: Add hierarchy visualization [analytics/dashiki] - https://gerrit.wikimedia.org/r/269713 (https://phabricator.wikimedia.org/T124296) [19:07:14] o/ ottomata [19:07:17] Can't make the checkin today [19:07:21] Crunching to get a paper done. [19:07:39] Paper topic is "modeling vandalism in Wikidata", so it's for a good cause. :) [19:16:35] Analytics, Wikimedia-Mailing-lists: home page for the analytics mailing list should link to gmane - https://phabricator.wikimedia.org/T116740#2019798 (Krenair) @mforns: Did you look into this? [19:23:28] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, and 2 others: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2019823 (jcrespo) I would want to promise this week, but if not, early next week. [19:23:45] Analytics-Kanban, DBA, Editing-Analysis, Patch-For-Review, and 2 others: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#2019825 (jcrespo) p:Triage>Normal [20:29:06] (Abandoned) Milimetric: Suggest ugly sql [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/269834 (owner: Milimetric) [20:31:02] mforns: sorry I was distracted. I like the colorization, but I have to spend some more time doing the review [20:31:19] I like the way similar things are colored relatively similar [20:31:31] I might increase the distance between colors, but I'm not sure [20:32:16] Analytics, Wikimedia-Mailing-lists: home page for the analytics mailing list should link to gmane - https://phabricator.wikimedia.org/T116740#2020015 (mforns) @Krenair, @kevinator Sorry, this got lost in our backlog. It's done now: https://lists.wikimedia.org/mailman/listinfo/analytics Cheers [20:32:25] also some of them make bad background for white text, so it might be good to change the way that works (write in a dark font conditionally) [20:32:34] Analytics-Kanban, Wikimedia-Mailing-lists: home page for the analytics mailing list should link to gmane - https://phabricator.wikimedia.org/T116740#2020017 (mforns) [20:35:00] Analytics-Kanban, Wikimedia-Mailing-lists: Home page for the analytics mailing list should link to gmane [1 pts] - https://phabricator.wikimedia.org/T116740#2020031 (mforns) [20:35:42] Analytics-Kanban, Wikimedia-Mailing-lists: Home page for the analytics mailing list should link to gmane [1 pts] - https://phabricator.wikimedia.org/T116740#2020034 (Jdforrester-WMF) :-( [20:39:01] milimetric, no worries, the code is a bit dense, I couldn't find a way to make it more self-explanatory [20:39:40] oh, mforns I meant to say this after standup [20:39:43] but the bug isn't in d3 [20:39:52] it's just javascript doesn't have an accurate floating point representation [20:39:53] milimetric, regarding the colors, yea, I'd like to increase the distance between them, but the full spectrum is used, there are basically no more hues to use [20:39:55] (something like double) [20:40:22] they just have Number which sucks and sometimes has that imprecision problem that you'd see with "float" in python or C# [20:40:25] (or java) [20:40:57] that makes sense about the colors. I'll take a closer look see if there's anything to be done [20:41:53] milimetric, I know, but... OK, then they should document that gotcha somewhere, because everyone that works with scales will have that problem no? [20:42:44] I think the zooming use-case wasn't anticipated, but I agree that they should maybe mention it [20:42:53] aha [20:42:55] you can submit a PR :) [20:42:58] hehe [21:22:22] ottomata: this eventlogging bug is really weird - it's not what i thought - execution is not out of order. This is what happens - for the current version of the Analytics schema, when I send events - It says hey this is blacklisted, not writing and it's all good. For an old version of the same Analytics schema - if I send events - It says Hey this is [21:22:22] blacklisted, not writing to eventlogging-valid-mixed - but it ends up in eventlogging-valid-mixed anyway - and by extension in Mysql [21:22:28] milimetric: ^ [21:23:59] how does it end up in valid-mixed anyway? [21:24:19] milimetric: that is the question [21:24:27] I think this just needs really really careful logging of exactly what each line thinks it's seeing [21:24:37] milimetric: I tried [21:25:13] i'd question everything though - maybe it's something with the new meta fields [21:25:26] milimetric: new meta fields? [21:26:06] like how the schema and revision used to be one way but now they can be in the meta: {...} field or something, the stuff andrew did [21:26:30] milimetric: oh i have no idea what that is [21:26:47] * madhuvishy looks [21:26:59] i forget how he decided to do all that 'cause we talked about it every which way, but essentially I'd question everything - like whether it thinks the schema is the name you think it would think [21:27:29] milimetric: hmmmmm [21:27:29] you don't have to find the code change, I mean just dig into all the functions that get executed around there that contribute to the decision [21:28:47] okay [21:29:07] i dont know if this is worth figuring out because it has never showed up but I need to knoww [21:29:30] (CR) Nuria: [C: 2 V: 2] "YAY!!!!" [analytics/refinery] - https://gerrit.wikimedia.org/r/269938 (owner: Joal) [21:39:14] Analytics-Kanban: Pageview API not dealing with url quoting very well - https://phabricator.wikimedia.org/T126669#2020367 (Milimetric) NEW a:Milimetric [21:42:22] madhuvishy: I think it's worth it, that's very weird behavior and you never know if it could cause something really bad [21:42:43] especially since it has to do with the blacklist. If it's not working, and we blacklist something that sends 10000 events a second, everything will blow up [21:43:31] If you don't get it by tomorrow I'll sit down with you and we'll squash it together :) [21:44:16] milimetric: okay :) [21:44:53] (CR) Milimetric: "::champagne::" [analytics/refinery] - https://gerrit.wikimedia.org/r/269938 (owner: Joal) [21:50:07] likewise madhuvishy don't have time now, about to head out [21:50:10] but happy to dig deep w you tomorrow [21:50:31] ottomata: okay zombie bug squashing party tomorrow then [21:57:59] bye a-team, see you tomorrow! [21:58:04] nite marcel [21:58:09] :] [22:45:44] there was a networking outage affecting some users in Europe between 21:00 and 22:17 UTC, in case it makes a difference to data analysis.