[00:58:55] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701171 (Nuria) >@Ottomata, main reason would be the ability to work with $simple_queue, $binary_kafka, $amazon_queue and so on without changes in MW code. This isn'... [01:17:00] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701193 (GWicke) @Nuria, see the task description, heading "Initial use cases". [01:19:25] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701197 (ori) >>! In T114443#1701193, @GWicke wrote: > @Nuria, see the task description, heading "Initial use cases". Potential applications are one thing; a concis... [01:49:29] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701206 (GWicke) [01:53:27] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701209 (GWicke) [01:53:38] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695470 (GWicke) @ori, I changed the text to clarify which of those are potential, and which are concrete plans for this quarter. Please follow the provided links if... [02:00:24] Analytics-EventLogging, Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#1701213 (AndyRussG) Thanks!! @ellery, now that the data is going into HDFS, is this task still relevant? Do you have plans to query banner history logs via MySQL s... [05:46:02] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701296 (Joe) Apart from the concerns on a practical use case which I agree with, I have a big doubt about the implementation idea: I am in general a fan of the par... [05:52:30] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701312 (Joe) >>! In T114443#1698223, @GWicke wrote: > @ottomata, main reason would be the ability to work with $simple_queue, $binary_kafka, $amazon_queue and so on... [09:30:07] Analytics-Tech-community-metrics: Update ITS related data from Bugzilla to Phabricator/Maniphest in project-info.json - https://phabricator.wikimedia.org/T114636#1701553 (Aklapper) NEW [09:30:30] Analytics-Tech-community-metrics: Update ITS related data from Bugzilla to Phabricator/Maniphest in project-info.json - https://phabricator.wikimedia.org/T114636#1701553 (Aklapper) [09:31:22] Analytics-Tech-community-metrics, DevRel-October-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1457306 (Aklapper) >>! In T106037#1567056, @Aklapper wrote: > Edit mediawiki-dashboard/browser/config/project-info.json Moved that... [09:32:36] Analytics-Tech-community-metrics, DevRel-October-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1701576 (Aklapper) Replacing/updating the navigation sidebar strings seem to be less trivial, they seem to be defined in `browser/l... [09:34:55] Analytics-General-or-Unknown: Replication checks disabled in Icinga for most analytics slaves - https://phabricator.wikimedia.org/T66088#1701578 (jcrespo) Open>Resolved a:jcrespo This does no longer apply, as this hosts have been decommissioned, and suggested fix applied a long time ago, and analyti... [09:38:57] Analytics-General-or-Unknown, Database: Create a table in labs with replication lag data - https://phabricator.wikimedia.org/T71463#1701589 (jcrespo) a:Springle>jcrespo This will be possible very soon due to T111266 [10:12:32] Analytics-Tech-community-metrics, DevRel-October-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1701637 (Aklapper) I gave this a first shot in https://github.com/Bitergia/mediawiki-dashboard/pull/66 (untested!, feedback welcome) by creating a... [10:57:44] Analytics, Services, operations, Icinga, Patch-For-Review: Icinga configuration broken by aqs - https://phabricator.wikimedia.org/T114556#1701698 (jcrespo) Open>Resolved a:jcrespo Resolved on https://gerrit.wikimedia.org/r/#/c/243632/ [10:58:59] Analytics, Services, operations, Icinga: Icinga configuration broken by aqs - https://phabricator.wikimedia.org/T114556#1701702 (Revi) [11:13:29] (PS1) Christopher Johnson (WMDE): updates owl with dataSourceURIs for new data adds comment annotations for metric descriptions adds isDefinedBy for Content Class Metrics adds 2 axis view for active users removes unneeded js and css files [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243634 [11:14:49] (PS2) Christopher Johnson (WMDE): updates owl with dataSourceURIs for new data adds comment annotations for metric descriptions adds isDefinedBy for Content Class Metrics adds 2 axis view for active users removes unneeded js and css files [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243634 [11:19:52] (PS3) Christopher Johnson (WMDE): updates owl with dataSourceURIs for new data adds comment annotations for metric descriptions adds isDefinedBy for Content Class Metrics adds 2 axis view for active users removes unneeded js and css files [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243634 [11:20:28] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] updates owl with dataSourceURIs for new data adds comment annotations for metric descriptions adds isDefinedBy for Content Class Metrics add [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243634 (owner: Christopher Johnson (WMDE)) [11:35:13] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1701759 (akosiaris) [11:37:47] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1701767 (akosiaris) >>! In T107056#1697401, @JAllemandou wrote: > Hey DevOps Guys, > As part of that task, w... [13:31:41] morning! [14:08:33] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1701979 (JAllemandou) @Ironholds: I corrected my previous comment: three diffs, not four. > # No filter of `action=edit` in R while present in java (~ + 3416000 / day)... [14:23:40] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1701995 (Nuria) >EventLogging: Decode, validate and enqueue JSON events for EL. mmm..I am not sure who would be the users of this endpoint at this time, do you have... [14:36:08] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1702008 (Nuria) >would be a pretty easy to then add a Monolog processor that added this bit of data to all log events that MediaWiki generated. We could also p... [14:50:15] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1702049 (Anomie) >>! In T108618#1702008, @Nuria wrote: > This describes a "session identifier" No, it describes a "request identifier". A "session identifier... [15:01:53] Analytics-Backlog: Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE - https://phabricator.wikimedia.org/T114489#1702098 (JAllemandou) @Halfak: There have been a misunderstanding about the format :) I think the best would be to have two tables, one with meta-data only, and one with meta-data+ t... [15:11:01] (PS3) Joal: [WIP] Add camus helper functions and job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) [15:12:24] (CR) Joal: [V: -1] "TODO: Correct refinery-core/pom.xml with working dependencies for camus jars." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [15:14:36] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Infrastructure-Team, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1702135 (EBernhardson) The avro encoder in php doesn't look to directly support json encoding, but i don't thi... [15:16:43] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1702141 (Ironholds) Don't apologise, that looks like totally serviceable code (heck, it's pretty much precisely how I'd do it). Tried multiplying the results by 1,000? ;... [15:42:19] i saw mention that some problems were run into in loading binary avro from kafka into hadoop. Thats plenty ok and will work out sending json (fairly easy) instead. I was just curious if the issues were documented anywhere (basically, i'm curious :) [15:52:55] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1702292 (Milimetric) Ok, I get the idea now, I think it's up to @kevinator if he thinks he can help coordinate. I'm a little hesitant to sign up to more meetings :) [15:53:46] Analytics-Kanban: Update camus-wmf to be deployed by maven (missing jars otherwise) {hawk} [8 pts] - https://phabricator.wikimedia.org/T114657#1702293 (JAllemandou) NEW [16:03:44] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1702327 (EBernhardson) I'd just like to add that I've used this request identifier in $dayjob-1 and it was incredibly useful for debugging issues across services. [16:03:48] Analytics-Backlog: Update stats.wikimedia.org's pipeline to use new pageview definition - https://phabricator.wikimedia.org/T112913#1702329 (Nuria) more info here: https://phabricator.wikimedia.org/T114379 [16:04:12] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702330 (Milimetric) [16:04:14] Analytics-Backlog: Update stats.wikimedia.org's pipeline to use new pageview definition - https://phabricator.wikimedia.org/T112913#1702331 (Milimetric) [16:05:04] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702333 (Milimetric) duplicate>Open oops, sorry, did it backwards [16:05:16] Analytics-Backlog: Update stats.wikimedia.org's pipeline to use new pageview definition - https://phabricator.wikimedia.org/T112913#1649740 (Milimetric) [16:05:17] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702336 (Milimetric) [16:05:41] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702340 (Milimetric) p:Triage>High [16:06:44] ebernhardson: We are still working on the binary avro loading - hasn't worked so far, but nuria and I are trying. But we got the JSON avro -> HDFS to work, so will keep you posted, if necessary we can go for that! Thanks :) [16:07:03] in theory both should work [16:10:06] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702363 (Milimetric) NEW a:Milimetric [16:10:20] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702374 (Milimetric) [16:10:27] ebernhardson|afk: the primary difference is coming from camus expecting the messages to have a schema id - which we don't have since we don't have a rest schema registry that issues ids - so all the code around decoding and writing need to be changed to accommodate it [16:10:44] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702380 (Milimetric) p:Triage>High [16:11:17] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1702384 (GWicke) I guess we have slightly different ideas about what a message bus should be: 1) a way to get blobs from a to b, and 2) a way to expose a stream of... [16:11:31] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702363 (Milimetric) [16:11:32] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1693683 (Milimetric) [16:17:56] Analytics-Backlog, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1702425 (Milimetric) [16:20:44] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 - https://phabricator.wikimedia.org/T112911#1702430 (Milimetric) [16:26:05] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 - https://phabricator.wikimedia.org/T112911#1702463 (Milimetric) [16:27:57] Analytics-Backlog, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1702484 (JAllemandou) More precisely - day restriction ``` WHERE NOT ((year = ${year}) AND (month = ${month}) AND (day = ${day})``` should be added... [16:33:11] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 {lama} [21 pts] - https://phabricator.wikimedia.org/T112911#1702526 (Milimetric) [16:36:15] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey or all schema - https://phabricator.wikimedia.org/T114164#1702546 (Jdlrobson) a:Jdlrobson [16:36:23] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey or all schema - https://phabricator.wikimedia.org/T114164#1686646 (Jdlrobson) a:Jdlrobson>JKatzWMF [16:38:31] Analytics-Backlog: Spike: Understand how Wikistats Traffic reports are computed {lama} [8 pts] - https://phabricator.wikimedia.org/T114669#1702557 (Milimetric) [16:39:13] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 {lama} [13 pts] - https://phabricator.wikimedia.org/T112911#1702568 (Milimetric) [16:39:23] Analytics-Backlog: Spike: Understand how Wikistats Traffic reports are computed {lama} [8 pts] - https://phabricator.wikimedia.org/T114669#1702570 (Milimetric) p:Triage>High [16:39:49] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 {lama} [13 pts] - https://phabricator.wikimedia.org/T112911#1649719 (Milimetric) [16:39:50] Analytics-Backlog: Spike: Understand how Wikistats Traffic reports are computed {lama} [8 pts] - https://phabricator.wikimedia.org/T114669#1702551 (Milimetric) [16:40:25] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702576 (Milimetric) a:Milimetric>None [16:41:44] Analytics-Backlog: Spike: Found out what dump file format does erik uses as feed to his definition - https://phabricator.wikimedia.org/T113981#1702582 (Milimetric) [16:41:45] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702363 (Milimetric) [16:42:23] Analytics-Backlog: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1702363 (Milimetric) [16:42:24] Analytics-Backlog: Spike: Found out what dump file format does erik uses as feed to his definition - https://phabricator.wikimedia.org/T113981#1681720 (Milimetric) [16:44:57] Analytics-Backlog: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284#1702604 (Milimetric) [16:45:56] Analytics-Backlog: Doc cleanup day 2.0 {flea} [15 pts] - https://phabricator.wikimedia.org/T112024#1702609 (Milimetric) [16:46:13] Analytics-Backlog, Analytics-Cluster, Easy: Mobile PM sees reports on browsers (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1702611 (Nuria) p:Normal>High [16:46:23] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1702612 (Anmolkalia) Right, I am on it :) [16:46:48] Analytics-Backlog: Gain permission to delete articles on wikitech (needed for doc cleanup) [3 pts] - https://phabricator.wikimedia.org/T114672#1702614 (Milimetric) NEW [16:50:00] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702624 (Tnegrin) Hi all -- Terry has asked that this be finished by the next scorecard (10/19). We need to let him know the schedule -- do we have an ETA? -Toby [16:56:27] Analytics-Backlog, Analytics-Cluster, Easy: Mobile PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1702658 (Milimetric) [16:57:09] Analytics-Backlog, Analytics-Cluster, Easy: PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1702663 (Nuria) [17:05:33] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1702686 (Milimetric) We didn't know how to estimate it, so we made a blocking task to look over Erik's diagrams. But I made that task the highest priority we have r... [17:14:11] milimetric, are you going to start https://phabricator.wikimedia.org/T114660 right now? [17:14:29] I think I have to eat lunch, but after that, yes [17:14:50] I have to look at the quarterly review presentation no2, because they are going to freeze it today [17:14:53] *now [17:15:59] milimetric, I can work with you on 16:30h Philadelphia time [17:16:31] mforns: Hi! [17:16:36] kevinator, hi [17:16:43] mforns: is there anything you wanted to go over in the QR deck? [17:16:52] kevinator, yes, Terry's comments [17:17:03] mforns: cool, if that's not too late for you that works for me. I'll delay until after the other meeting todya [17:17:29] no one in here is working on analytics1035 directly right? (it just threw raid and ssh errors so likely os offline) [17:18:12] mforns: hangout? batcave is in use so a private one [17:18:24] kevinator, yes [17:18:30] a-batcave-2 [17:18:45] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave-2 [17:40:35] madhuvishy: back.. batcave? [17:41:21] nuria: yup [17:49:39] kevinator: I was wondering what we'll talk about in the meeting with communications [17:50:09] if there's nothing t o talk about... then end it right away [17:50:35] you can give them an update on when it is expected to launch [17:50:38] ok, cool [17:50:46] and when we would not put up a blog post [17:54:57] (PS5) Madhuvishy: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 (owner: Nuria) [18:11:38] Analytics-Cluster, operations, Patch-For-Review: Turn off webrequest udp2log instances. - https://phabricator.wikimedia.org/T97294#1703087 (Jgreen) [18:15:18] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1703097 (Eevans) >>! In T114443#1701296, @Joe wrote: > Apart from the concerns on a practical use case which I agree with, I have a big doubt about the implementatio... [18:18:43] (PS6) Madhuvishy: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 (owner: Nuria) [18:19:12] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1703102 (ezachte) @Milimetric the way I see it and discussed it so far with Toby and Kevin is a bit different. I'd expect to do consultancy and vetting the results.... [18:40:55] (PS7) Nuria: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 [18:43:30] Analytics-Backlog: Anonymize data in pageview_hourly to comply with privacy policy - https://phabricator.wikimedia.org/T114675#1703219 (Aklapper) Hi @Nuria. Please associate at least one [[ https://phabricator.wikimedia.org/project/query/active/ | project ]] with this task, otherwise nobody can find this task... [18:48:18] (PS8) Madhuvishy: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 (owner: Nuria) [19:00:32] ebernhardson: good news, writing binary works too [19:01:32] sweet! [19:02:54] :) [19:11:24] (PS9) Madhuvishy: [WIP] Changes to camus to debug/test avro in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 (owner: Nuria) [19:24:29] Hey milimetric, we have access to aqs1001 :) [19:24:49] I have looked into cassandra, seems empty [19:25:01] awesome [19:25:23] we should wait for Andrew to find out what happened and if they're going to wipe the boxes again [19:25:27] (they may still be deploying) [19:25:46] right, I'll wait tomorrow :) [19:25:53] That's cool ! [19:26:01] hopefully some data in there soon :) [19:26:06] i'm gonna ssh in just to feel the power :) [19:26:12] :D [19:26:26] joal: if you want to test the loading though, go ahead [19:26:39] I just mean you probably shouldn't try to load everything [19:26:46] milimetric: I'll do that tomorrow morning: create a fake ta [19:26:48] ble, [19:26:53] and test some loading [19:26:57] oh you mean cassandra's empty empty, no tables even [19:26:58] ok [19:27:02] yup [19:42:04] ebernhardson: could you point me to the Search avro schema, and may be an example message (avro json would be cool) [19:51:51] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1703389 (JAllemandou) Timestamp conversion tested on one day 2014-04-01 (heavy computation, so keep it small) --> still 721702. I have another experiment on the go: fil... [19:53:12] Bye a-team ! see you tomorrwo ! [19:53:18] nite jo [19:53:55] madhuvishy: annoyingly i don't have any good centralized place for it. The first quick hack is in https://gerrit.wikimedia.org/r/#/c/240615/5/wmf-config/InitialiseSettings.php line 4324 [19:54:03] the intention is for it to *not* live there [19:54:32] as for a message, i made some samples before but didn't update with the latest changes. i'll make one in as ec [19:56:28] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1703405 (jrbs) [20:13:12] milimetric, I'm back [20:13:29] cool, mforns, let's hang out and go over those diagrams? [20:13:34] yep! [20:35:40] milimetric: mforns: here a diagram for your hangout https://phabricator.wikimedia.org/macro/view/15/ [20:35:56] you can easily add it to a Phabricator task by commenting: flowchart [20:35:58] * hashar hides [20:36:12] thx hashar, we hid everything, we're going on vacation [20:38:29] ebernhardson: what about mediawiki config for schemas? can it be used for that ? ccmadhu [20:38:33] cc madhuvishy [20:41:00] nuria: if you mean operations/mediawiki-config, that's the patch above. or did you mean meta.wikimedia.org/wiki/Schema:Xxx [20:42:33] ebernhardson: sorry, yes i meant mediawiki-config, i see, my mistake. [20:42:47] Analytics-Backlog: Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE - https://phabricator.wikimedia.org/T114489#1703562 (Halfak) Indeed. I think we're imagining the same thing. *All fields == metadata+text *Drop the 'text' field == metadata [20:50:42] ebernhardson: thanks, I think I can work with that. [20:50:51] nuria: hmmm, that is JSON [20:51:04] madhuvishy: right [20:51:20] madhuvishy: not avro , you mean? [20:51:24] yes [20:51:34] it's currently only used by EL [20:53:53] nuria: its based on json schema, and we'd have to make an extension for avro schema if we wanted to go the mediawiki way [20:54:04] I thought we were gonna have the schemas in a git repo [20:54:14] and supply the jar using libjars [20:54:19] madhuvishy: we have to [20:54:39] ebernhardson, madhuvishy : quick hangout? [20:54:53] ebernhardson: there are two sides: supply side and consumer side [20:55:10] ebernhardson: consumer side cannot read mediawiki php code [20:55:56] ebernhardson: but I d not know mw code enough to suggest another alternative [20:56:06] sure i can do that [20:56:24] we certainly need a good place for schemas, i just don't know where that is [20:56:42] nuria: I think it's possible to consume from the mediawiki page as json [20:56:48] that's what EL does no? [20:57:19] we'd have to have a class that requests the schema etc - I'm not saying we do this - but it should be possible [20:57:49] in general, i think following event loggings lead is our best bet [20:58:20] nuria: i can hangout [20:58:22] madhuvishy:yes, but (in the case of camus setup) i *believe* consuming from filesystem should be much easier, i am not super keen on adding http dependencies to our wmf-camus jar or similar [20:58:30] nuria: yes yes [20:59:38] ebernhardson: https://www.mediawiki.org/wiki/Extension:JsonConfig you'd need something similar if you wanted to store in a mediawiki like AvroSchema:Search.. [21:01:01] I don't know if you can just read from a git repo to produce, but on our side, the schema will have to reside in some repo - which we'll pack as a jar, and supply to the thing that imports the data from kafka to hdfs [21:01:22] there's no such existing repo, and we'll have to make one [21:01:32] nuria: ^ makes sense? [21:02:25] the mw side reads the avro schema from a git repo, yea. its the easiest possible thing to start with. [21:02:41] ebernhardson: cool then [21:02:56] the one thing meta buys us, is simplicity in updating schemas [21:03:00] but two places isn't the end of the world [21:03:22] (and hopefully they don't change much after an initial settling period) [21:04:02] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1703643 (Nuria) >a way to expose a stream of events in a defined format that can be consumed easily by a range of clients. This talks about consumption, not producti... [21:04:06] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1703645 (Milimetric) Ok, we reviewed the diagrams. 1st diagram. @ezachte: we assumed that there's a little typo in the first two boxes, and that instead of "pageco... [21:04:09] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1703644 (bd808) a:bd808 [21:04:14] ebernhardson: right [21:04:31] ebernhardson, madhuvishy : if experience is a guide schemas will change quite a bit [21:04:42] ebernhardson, madhuvishy : if only to correct mistakes [21:04:50] ebernhardson, madhuvishy : which is totally fine [21:04:55] hmm, true [21:05:18] nuria: true, but even in the git model it can be updated, would require us to bump jar versions etc in puppet - but possible [21:05:46] ebernhardson, madhuvishy : yes but that is fine, we deploy new versions using puppet [21:05:52] yup [21:05:58] ebernhardson, madhuvishy : which is what puppet is for [21:06:14] (we need to confirm all this with otto though) [21:07:38] nuria: alright - I'm gonna clean up/start a new patch with new classes for our topic based decoders, and a schema registry that takes a topic name, and picks up the matching schema. [21:07:54] ebernhardson: how would mw code read from a depot like analytics/events/schemas? [21:08:30] nuria: it would have to be submoduled into operations/mediawiki-config i suppose [21:09:49] madhuvishy: let's think for a sec, the schemas are going to be on an outside depot taht will contain something silimar to this: [21:10:22] madhuvishy: https://gerrit.wikimedia.org/r/#/c/242907/9/camus-example/src/main/avro/DummyLog.avsc [21:12:02] madhuvishy: some process is generating the java classes for schemas: https://gerrit.wikimedia.org/r/#/c/242907/9/camus-example/src/main/java/com/linkedin/camus/example/records/DummyLog.java [21:12:49] madhuvishy: and this DummyLog.java is what is going to be loaded at runtime by the class loader [21:13:03] madhuvishy: does this make sense? [21:13:07] nuria: batcave? [21:13:19] madhuvishy: ok, ebernhardson want to batcave? [21:13:31] https://plus.google.com/hangouts/_/wikimedia.org/a-batcave1 [21:13:39] cos normal one is in use [21:13:58] nuria: ^ [21:22:12] * ebernhardson totally spaced out... but back now :) [21:46:18] madhuvishy: this is the schema and a sample message (pretty printed and such) if you still need: https://gist.github.com/ebernhardson/efdfcfee8b8cc62da03f [21:46:38] ebernhardson: thanks a ton! [21:55:21] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1703767 (GWicke) Several services including RESTBase and Parsoid already set and forward a v1 uuid in the `X-Request-Id` header. This header is also set in sub... [22:04:07] mforns: you there? [22:29:36] kevinator, hi [22:29:40] yes [22:30:13] I was having a snak [22:30:17] *snack [22:42:21] no worries. [22:42:32] did you make any more edits to the slides? [22:42:49] mforns: I moved the draft slides into a master deck [22:42:58] kevinator, I saw that [22:43:04] no, not yet [22:43:16] mforns: I also corrected one appendix. Vital signs has data starting in May not March [22:43:30] do you have more edits to make? [22:43:31] I was waiting for Terry's response, but I guess, seeing the other presentations, I can complete a little bit the workflows slide [22:43:54] at what time is the presentation going to be frozen? [22:44:57] Analytics-Backlog, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1703877 (Tnegrin) I spoke with Kevin on this -- he'll follow up with you, but yes, this is a high priority task. If we can't finish by the requested date, we (you) n... [22:45:06] kevinator, ^ [22:46:07] mforns: it's supposed to be frozen today... [22:47:07] mforns: but i think minor changes are ok between now and tomorrow [22:47:23] kevinator, I'll work on the workflow slide a bit now [22:47:37] Terry already has a printed out copy of the deck. [22:47:56] ok, let me know if you change anything significant... [22:48:12] but thanks for making improvements... otherwise, it already looks good. [22:50:39] kevinator, ok [22:55:26] nuria: still around? [23:03:58] (PS1) Christopher Johnson (WMDE): adds local data file retrieve function adds remote sparql query and xml parse to data frame adds local write to tsv with date stamp function [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243826 [23:14:03] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1704033 (Tgr) >>! In T108618#1703767, @GWicke wrote: > Several services including RESTBase and Parsoid already set and forward a [v1 uuid](https://en.wikipedia.org/wiki/... [23:22:57] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1704071 (GWicke) > Varnish should cache the X-Request-Id header when it caches a response (this probably happens automatically) This is actually a good point to conside... [23:36:37] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1704118 (Tgr) They could also be removed by Varnish - that way the webrequest log still includes them but the user does not receive them. Presumably an 50b / request ove...