[00:33:25] Analytics-Cluster, operations, ops-eqiad: rack new hadoop worker nodes - https://phabricator.wikimedia.org/T104463#1419469 (RobH) a:RobH>Cmjohnson [02:56:04] Analytics-Wikistats: Microsoft Edge user agent is not recognized - https://phabricator.wikimedia.org/T104531#1419606 (Tgr) NEW [06:25:01] Analytics-Kanban, Patch-For-Review: Link to new projectcounts data and serve via wikimetrics {Musk} [5 pts] - https://phabricator.wikimedia.org/T104003#1419824 (kevinator) Open>Resolved [06:59:18] Analytics-Kanban, MediaWiki-extensions-ExtensionDistributor, Patch-For-Review: Set up graphs and dumps for ExtensionDistributor download statistics {frog} [3 pts] - https://phabricator.wikimedia.org/T101194#1419850 (Nemo_bis) > If you want to share your dashboard with the world: https://meta.wikimedia.... [07:40:03] Where did https://metrics.wmflabs.org/static/public/dash/ go :o [07:42:04] Analytics-Dashiki: Dashiki 404 - https://phabricator.wikimedia.org/T104545#1419930 (Nemo_bis) NEW [07:45:03] Analytics-Dashiki, Analytics-Kanban: {crow} Dashiki Ops - https://phabricator.wikimedia.org/T102250#1419941 (Nemo_bis) Project? What project? Did you mean this task? [07:46:04] Analytics-Dashiki: WMF Dashiki instance should have reasonable URL - https://phabricator.wikimedia.org/T88390#1419942 (Nemo_bis) [07:46:04] Analytics-Dashiki: Dashiki 404 - https://phabricator.wikimedia.org/T104545#1419943 (Nemo_bis) [10:31:43] * mforns tests [10:40:14] mforns: seems to work :) [10:40:24] joal, :] [10:49:24] Analytics-Backlog: Sanitize aggregated data presented in VitalSign using K-Aninymiy --> K to define. - https://phabricator.wikimedia.org/T104485#1420272 (JAllemandou) [11:13:43] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1420364 (Aklapper) >>! In T103292#1415360, @Dicortazar wrote: > in the case of Gerrit, we may use... [12:06:35] Analytics-Tech-community-metrics: Exclude third-party / pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1420471 (Qgil) [12:06:36] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1420470 (Qgil) [13:01:25] Analytics-Tech-community-metrics: Exclude third-party / pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1420603 (Qgil) [13:03:28] Analytics-Tech-community-metrics: Exclude third-party / pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1420617 (Qgil) p:Normal>High I have gone through the list of repos looking for easy catch, and I have updated the description with the findings. It looks l... [13:03:37] Analytics-Tech-community-metrics, ECT-July-2015: Exclude third-party / pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1420619 (Qgil) [13:09:12] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1420645 (Qgil) I would forget about GitHub for now. We can say that korma and the activities deri... [13:09:56] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1420651 (Qgil) a:Dicortazar Assigning to @dicortazar since you are working on this indeed. :) [15:11:35] morninnng [15:11:45] milimetric, my python fixer guru man! [15:11:58] i am having a problem running EL tests! and i am not sure why. something weird with imports [15:12:35] vagrant? [15:13:26] yes [15:13:31] but also on an04 [15:13:40] File "tests/test_jrm.py", line 15, in [15:13:40] import eventlogging [15:13:40] File "eventlogging/__init__.py", line 20, in [15:13:40] from .factory import * [15:13:40] File "eventlogging/factory.py", line 12, in [15:13:41] from .utils import cast_string [15:13:41] File "eventlogging/utils.py", line 21, in [15:13:42] from .factory import get_reader [15:13:42] ImportError: cannot import name get_reader [15:15:03] weird... you'd think it would say it didn't know what .factory was [15:15:09] but if it found that, it would easily find get_reader [15:15:15] circular import? python should deal, right? [15:15:25] no, python certainly does *not* deal [15:15:38] haha [15:15:41] that's what I'm thinking it is, yea, circular import [15:15:52] oook, well i can just put cast_string in factory [15:15:53] whatever [15:15:54] :) [15:15:59] seemed better in utils, but meh? [15:16:26] you can make it _cast_string in there [15:17:13] nahhh [15:18:28] hi everyone. [15:18:50] hi joal. Whenever you have time, please ping me re hashing. :-) [15:20:10] Hey leila [15:20:18] I have 10 minutes before standup if you want :) [15:20:29] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1421157 (Dicortazar) Hi, some more comments after reviewing the numbers. With respect to the lis... [15:20:35] okay. let's use that. :-) inviting you in Hangout [15:20:42] batcave, joal? [15:20:45] sure [15:21:43] I'm in it joal [15:22:04] joining [15:23:28] joal (cc madhuvishy if you're online!) - do you recall the 10th/50th/90th (or even just mean average) pageviews per session for desktop? i'm trying to contextualize the wmf.mobile_apps_session_metrics results. my read of that table is at the 10th "percentile" typical behavior is 1-2 pageviews, at 50th percentile it's 2-3 pageviews in a session, and at the [15:23:29] 90th percentile it's 7-8 pageviews - on the mobile apps. but i wasn't sure about desktop or mobile web behavior? [15:25:12] Analytics-Cluster, Analytics-Kanban: {musk} Pageviews in Vital Signs - https://phabricator.wikimedia.org/T101120#1421167 (kevinator) [15:25:17] mforns: you should come back! [15:25:43] Analytics-Cluster, Analytics-Kanban: {musk} Pageviews in Vital Signs - https://phabricator.wikimedia.org/T101120#1330109 (kevinator) [15:25:44] Analytics-Backlog, Analytics-Cluster, Analytics-Dashiki: Vital Signs user clicks "Add Metric" then Daily Pageviews (New Definition) - https://phabricator.wikimedia.org/T86141#1421172 (kevinator) [15:25:48] leila, joal, I realized it's still 5 minutes to stand-up, leave you guys talk! thanks :] [15:26:40] Analytics-Backlog, Analytics-Cluster, Analytics-Dashiki: Analytics Engineer has an oozie job to aggreate page views by time - https://phabricator.wikimedia.org/T88125#1421180 (kevinator) [15:26:40] Analytics-Cluster, Analytics-Kanban: {musk} Pageviews in Vital Signs - https://phabricator.wikimedia.org/T101120#1330109 (kevinator) [15:27:05] Analytics-Backlog, Analytics-Cluster, Analytics-Dashiki: Vital Signs user has access to the new page view data - https://phabricator.wikimedia.org/T88128#1421184 (kevinator) [15:27:06] Analytics-Cluster, Analytics-Kanban: {musk} Pageviews in Vital Signs - https://phabricator.wikimedia.org/T101120#1330109 (kevinator) [15:27:08] Analytics-Backlog, Analytics-Cluster, Analytics-Dashiki: Analytics Engineer has a job to aggregate pageviews by project - https://phabricator.wikimedia.org/T88127#1421185 (kevinator) [15:27:30] thanks milimetric, this easy one is ready for review :) [15:27:30] https://gerrit.wikimedia.org/r/#/c/222300 [15:28:38] Analytics-Kanban, MediaWiki-extensions-ExtensionDistributor, Patch-For-Review: Set up graphs and dumps for ExtensionDistributor download statistics {frog} [3 pts] - https://phabricator.wikimedia.org/T101194#1421193 (Milimetric) I'm happy to rename it, that's just the name that I was given above. And a... [15:37:04] Analytics-Kanban, Patch-For-Review: Processor writes valid and invalid events to separate Kafka topics {stag} [13 points] - https://phabricator.wikimedia.org/T98781#1421216 (kevinator) p:Triage>Normal [15:37:19] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1421217 (kevinator) p:Triage>Normal [16:23:09] (PS1) Milimetric: Freshen data for July Meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/222332 [16:23:26] (CR) Milimetric: [C: 2 V: 2] Freshen data for July Meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/222332 (owner: Milimetric) [16:29:33] Analytics-Backlog, Analytics-Wikimetrics, Patch-For-Review: Deal with non-timeboxed queries recomputing too much data for the mobile report-card - https://phabricator.wikimedia.org/T98979#1421400 (Milimetric) [16:29:33] Analytics-Backlog: Clean up mobile-reportcard dashboards {frog} - https://phabricator.wikimedia.org/T104379#1421401 (Milimetric) [16:37:32] Analytics-Backlog: Sanitize aggregated data presented in VitalSign using K-Aninymiy --> K to define. - https://phabricator.wikimedia.org/T104485#1421504 (kevinator) p:Triage>High [16:41:23] Analytics-Backlog: Clean up mobile-reportcard dashboards {frog} - https://phabricator.wikimedia.org/T104379#1421544 (kevinator) p:Triage>Unbreak! [16:43:53] Analytics, Analytics-Backlog, Varnish: https://wikitech.wikimedia.org/beacon/statsv 404 Not Found - https://phabricator.wikimedia.org/T104359#1421560 (Milimetric) Is someone taking care of this or does Analytics need to look at it? [16:46:30] Analytics-Backlog, Varnish: https://wikitech.wikimedia.org/beacon/statsv 404 Not Found - https://phabricator.wikimedia.org/T104359#1421568 (kevinator) [16:47:27] Analytics-Backlog: EventLogging cleanup debrief {tick} - https://phabricator.wikimedia.org/T104351#1421573 (kevinator) p:Triage>Normal [16:47:50] Analytics-Backlog: Host a debrief of EventLogging cleanup {tick} - https://phabricator.wikimedia.org/T104351#1421578 (kevinator) [16:57:17] Analytics, Analytics-Backlog, Performance-Team: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1421624 (kevinator) p:Triage>Normal [17:15:21] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1421752 (Milimetric) Some notes inline. By the way, did you and @violetto get a chance to meet about this? In general, this task seems like it could be broken down and prioritized so we... [17:42:13] joal: do you have time now? :-) [17:42:31] hey leila [17:42:36] I have some, yes [17:42:42] batcave, joal? [17:42:49] on my way [17:43:06] me, too. [17:51:18] Analytics, MediaWiki-Vagrant: fuse fails on vagrant - https://phabricator.wikimedia.org/T103484#1421882 (Ottomata) Yeah it works on my vagrant too. Hm. [18:35:24] milimetric: this is ready for review [18:35:25] https://gerrit.wikimedia.org/r/#/c/221664/3/server/eventlogging/handlers.py [18:36:07] cool, just finishing lunch, will review soon [18:36:15] k danke [18:45:10] ha, milimetric, madhuvishy! you can plug custom readers and writers into eventlogging without modifying source! [18:45:16] see load_plugins in handlers.py [18:45:16] cool! [18:45:18] didn't realize! [18:45:27] so slick ori! [18:45:29] ottomata: aaah [18:45:50] so much treasure in that code huh [18:45:52] yeah! [18:46:47] ottomata: I'm gonna be around today. pushed dmv appointment to tomorrow thanks to jon ronson's advice. I looked at the multi process consumer and the balanced one - and I'm tipping towards the balanced one [18:47:05] if we use the multiprocess one we'd have to rewrite it for our use [18:47:13] we'll see [18:47:40] the balanced one is more straightforward for our use i think. i'll experiment and know more [18:48:03] yeah i think you are right. [18:48:08] :) [18:48:11] and so does milimetric :) [18:51:45] what's the difference between the KafkaConsumer and the MuliprocessConsumer? [19:05:24] milimetric: MultiprocessConsumer wraps SimpleConsumer in multipe processes [19:05:35] and handles auto commit offsets in parent process [19:05:49] KafkaConsumer is single process, and handles auto_commit_offsets in same process [19:06:27] if I follow your thought process (beacuse you are reviewing) [19:06:31] why not use MultiprocessConsumer now? [19:06:53] because as is it wouldn't get us much. we can still read faster than we can process (even in parallel) with a single process feeding the el-processor [19:07:37] i did experiement with using MultiprocessConsumer + parallel el-processor, but i didn't get it to do the el-processing in the same process for each side [19:07:45] no, I'm just trying to understand. soo then is there a difference between kafka-python's KafkaConsumer and pykafka's BalancedConsumer? [19:07:48] my experiment was dumb [19:08:51] (I think I understood the reader/writer separation as you explained it yesterday, so that makes sense) [19:09:26] read/process rather [19:19:06] sorry [19:19:12] milimetric: yes [19:19:33] the main difference is the way the BalancedConsumer manaages itself. [19:19:52] BalancedConsumer talks to zookeeper directly (not via Kafka) to manage partition assignment [19:20:02] so it is real fancy :) [19:20:16] with SimpleConsumer and KafkaConsumer, you get one process consuming all partitions [19:20:38] with MulitprocessConsumer, you get many processes consuming partitions, but only on a single node, as the parent process manages the partition assigment [19:20:58] with BalancedConsumer, you get auto partition assignemnt managed by zookeeper, so it is a clustered consumer [19:21:24] i think it would actually work well to integrate that somehow with the parallel elprocessor work i'm doing [19:21:40] because, you could use mulitprocessing to spawn up N instances of BalancedConsumer [19:21:46] on a singel node [19:21:52] AND, if you wanted to do it on multiple nodes [19:21:56] you can, without any extra coding [19:22:05] Oh, ok, so kafka-python then in my opinion is pretty poopy [19:22:40] cool, thx [19:23:25] haha, it isn't poopy! just not as cool as pykafka maybe [19:23:38] the kafka-python interface is definitely more pythonic [19:23:42] iterators, etc. [19:23:58] i mean, i don't know pykafka well, actually, maybe it is cool [19:24:00] :) [19:38:58] Analytics-General-or-Unknown: Story: define aggregation/anonymization strategy for long term retention of logs on 1002 that complies with privacy policy - https://phabricator.wikimedia.org/T75093#1422324 (Nemo_bis) [19:41:04] milimetric: where is the code where the last access cookie is being set in x_analytics [19:41:16] i remember you showed me once before [19:41:20] one sec [19:42:46] madhuvishy: the way I find it is I look for the patch in gerrit: https://gerrit.wikimedia.org/r/#/c/196009/ [19:42:58] (just look for owner:nuria in this case) [19:42:59] milimetric: aah right [19:43:03] okay :) [19:43:05] thanks! [19:43:11] and then I use github to browse: https://github.com/wikimedia/operations-puppet/tree/production/ [19:43:30] so in this case, this is probably the most pertinent file: https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/last-access.inc.vcl.erb [19:47:39] milimetric: perfect, thanks [19:47:49] milimetric: , fixed formatting errors [19:48:48] (merged) [19:49:30] great! [19:49:42] milimetric: and now this one is ready for you :) [19:49:43] https://gerrit.wikimedia.org/r/#/c/222190/2 [19:52:48] Analytics-Kanban: Vet data in intermediate aggregate {wren} [8 pts] - https://phabricator.wikimedia.org/T102161#1422406 (Milimetric) @Ironholds, I'm not as familiar as you are with all of this, but the results seem to make sense to me. Take a look if you're interested. @JAllemandou, looks good to me, you ca... [19:54:25] Analytics-Kanban: Vet data in intermediate aggregate {wren} [8 pts] - https://phabricator.wikimedia.org/T102161#1422424 (Milimetric) Also the mobile data seems cleaner this way. Really great. [20:01:44] ottomata: merged with a small quip [20:02:40] milimetric: you might be right about that, depending on how madhu's work goes. [20:02:55] if we end up building in another kafka lib (pykafka) support, and want to keep the EL stream abstraction [20:03:04] we will probably configure the reader/writer via uri like you say [20:03:33] i think it is fine to keep them together for this case. [20:03:46] since keyed and simple producers are subclasses of the same class [20:03:55] Base Producer [20:17:31] ottomata: milimetric just reading the discussion. they fed us real good thai food and put me in a sleepy mood [20:17:45] :) [20:38:36] joal, you're probably sleeping, but if not, know that your json converter worked wonderfully and I'll be kicking off my diff jobs this evening. WOoot! [20:38:53] * halfak hops on bike [20:38:57] back in about an hour [20:39:06] halfak: on my way to sleep [20:39:16] but WOOOOT as well before sleeping ;) [20:39:20] :D! [20:39:46] I mean, it's just JSON conversion, but that's as much as I could manage to solve so ;) [20:40:05] See you guys after your long weekend :) [20:59:04] Analytics-Wikistats: stats.wikimedia.org needs options to see exact counts and dates - https://phabricator.wikimedia.org/T37150#1422832 (Nemo_bis) Well not really, the code is eventually sync'ed/merged in my experience. [21:05:06] laters all! [21:12:17] bye everyone! have a nice long weekend! [21:15:48] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1422897 (ori) Open>Resolved a:ori This is in the process of rolling out to all Varnishes. It should propagate everywhere within the n... [21:22:00] Hey, do you know when https://reportcard.wmflabs.org gets updated? [21:35:08] milimetric: ^ [21:55:06] * halfak kicks off a new enwiki diffs job on the altiscale cluster [21:55:39] James_F: I updated it this morning. It's manual, basically monthly unless there are problems making or processing the dumps [21:55:47] And we failed :( [21:56:20] milimetric: Ah. And the data lags by a month? [21:56:24] milimetric: (Sounds painful!) [21:58:17] Heh, it's a ritual I do because it's too hard to automate when there are problems and it serves as a nice reminder of just how much we have left to do on the infrastructure. The data lags because historically it took over three weeks to create the dumps, then a few days to process them [21:58:25] Right. [21:58:45] but recently it's not as bad, so I actually updated it twice this last month. [21:59:46] pageview data, for example is usually not lagging at all, James_F, but then again that should be replaced with the new pageview definition that's live and up on vital-signs [22:01:03] (vital signs is part of the new infrastructure and is automatically updated daily) [22:01:08] * James_F nods. [22:07:48] I'm just interested as the Annual Report showed a 'dashboard' with five of those numbers being Editing's 'fault'. :-) [22:10:54] milimetric: Do we "only" go back three years to make it faster to calculate, or is there a different reason? [22:13:23] i think that's just when this dashboard was released James_F , three years sounds right because it was right before I joined. Older data should be available on wikistats, that's the source of these numbers. Weird that they could assign blame using numbers from that dashboard, seems too high level for that [22:13:55] milimetric: Blame is probably too strong. [22:15:41] milimetric: But e.g. that in FY2013 117M WP edits were made, and yet in FY2012 it was 159M, is something we should keep an eye on. [22:16:07] (This year we're so far 4.90% ahead of last year, from my quick reckoning.) [22:17:06] to me, number of edits has always been kind of a crappy measure, like lines of code for a programmer [22:17:13] Oh, indeed. [22:17:34] I once wrote a six million line insurance company system. I am... Not proud of that [22:18:15] For instance, FY2012 was when we moved inter-language links to Wikidata, which took a huge number of bot synchronisation edits and made them irrelevant. [22:18:26] So it going down is expected and reasonable. [22:18:42] But it also helps with the "trend" stories we get in the media. [22:19:02] E.g. is the "Oh shit" trend of active editors true? Etc. [22:20:36] right. Once I feel like my infrastructure struggles are worth it I'd love to maybe get a data science micro degree and find some better metrics here. I feel like we miss a lot of the beauty of how and why people collaborate with our weird "active editor" binoculars [22:22:05] * James_F nods. [22:39:51] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1339575 (leila) @madhuvishy do you mind leaving an update here based on the app comparisons we were trying to do and what is blocking that? thank you! [23:46:52] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1423471 (Neil_P._Quinn_WMF) [23:48:39] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1412040 (Neil_P._Quinn_WMF) [23:51:36] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1423479 (Neil_P._Quinn_WMF)