[00:01:24] (PS1) BryanDavis: Fix NetworkOriginUDF tests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253790 [00:14:21] (PS1) BryanDavis: Clean up IpUtil trusted proxy initialization [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253793 [00:30:07] Analytics-Backlog: Pageview API: Escape double quotes in article titles - https://phabricator.wikimedia.org/T118913#1812574 (ellery) NEW [00:33:15] bd808: ok, i should have run tests, which i didn't . will make sure to that next time, i normally always do it and somehow forgot this time. [00:33:22] madhuvishy: yt? [00:33:28] nuria: yeah [00:47:38] bd808: sorry, gerrit is slowwwww [00:55:34] bd808: mmm.. tests failing due to some camus test, is that happening for you? [00:56:23] Analytics-Backlog, Fundraising research, Research-and-Data: FR tech hadoop onboarding - https://phabricator.wikimedia.org/T118613#1812664 (Tbayer) > [ ] @Tbayer would you be able to give them a walkthrough of webrequest / pv data with example queries? @DarTar : @JKatzWMF and I think that's in the Ana... [00:57:54] nuria: yeah that one test never passed for me even before I started making patches [00:58:16] It looked like it was maybe a config problem of some sort (missing test table or something) [00:58:52] cc joal. ok. let me see if i can fix it [01:07:00] bd808: i see, test donot run if run from source dir [01:07:10] bd808: resources are there paths are wrong [01:11:44] Analytics-Backlog, Fundraising research, Research-and-Data: FR tech hadoop onboarding - https://phabricator.wikimedia.org/T118613#1812733 (madhuvishy) I'm around to help too - with access to stat1002/anything with regard to querying for the data. @DarTar as far as permits go, I see that andyrussg and a... [01:31:28] Analytics-Backlog: Pageview API: Escape double quotes in article titles - https://phabricator.wikimedia.org/T118913#1812780 (ellery) [02:07:02] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1812875 (MZMcBride) Very nice. Thank you all for the quick investigation and resolution! [03:10:44] Is it intentional that the /top/ action in the new page view API has a double-encoded 'articles' array? [03:11:06] It seems it is output as a object of JSON inside a string of json. Thus requiring explicit re-parsing [03:11:14] the other actions don't do that and just work directly. [04:48:53] Also, I'm not sure what to think of there being over 1M page views for enwiki Special:BlankPage [04:49:02] #3 of all page views [05:06:35] Analytics-Backlog: Blogpost Pageview API - https://phabricator.wikimedia.org/T118866#1812993 (MZMcBride) This task feels like a possible duplicate of {T118471}. [05:24:53] Analytics, Services: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1813007 (MZMcBride) NEW [05:41:01] Analytics, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813033 (MZMcBride) NEW [06:57:19] (CR) Gergő Tisza: Add UDF for network origin (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253046 (https://phabricator.wikimedia.org/T118592) (owner: BryanDavis) [07:48:26] Analytics, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813150 (Tbayer) To mention other recent pageview weirdness tickets (may or may not be related, in any case they illustrate it might not be the API's fault): T117945 (https... [09:10:15] (PS1) Addshore: Retry identica metric on empty page [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253854 [09:10:28] (CR) Addshore: [C: 2 V: 2] Retry identica metric on empty page [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253854 (owner: Addshore) [09:19:46] (PS1) Addshore: Add single php methdo for curl wrapper [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253855 [09:20:07] (PS2) Addshore: Add single php methdod for curl wrapper [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253855 [09:20:37] (CR) Addshore: [C: 2 V: 2] Add single php methdod for curl wrapper [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253855 (owner: Addshore) [09:26:00] (PS1) Addshore: Unify useragents [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253856 [09:26:13] (CR) Addshore: [C: 2 V: 2] Unify useragents [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253856 (owner: Addshore) [09:30:07] (Abandoned) DCausse: Test do not merge [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253393 (owner: DCausse) [09:31:20] Analytics-Backlog: Create a dedicated hive table with pageview API only requests for reporting - https://phabricator.wikimedia.org/T118938#1813225 (JAllemandou) NEW [09:48:07] Analytics, Analytics-Kanban, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813246 (JAllemandou) a:JAllemandou [09:50:20] Analytics, Analytics-Kanban, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813252 (JAllemandou) Quick data check: ---- Elio Motors page seems Ok ``` https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikiped... [10:04:30] (CR) Joal: [C: 2 V: 2] "Thqanks for the quick fix :)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253790 (owner: BryanDavis) [10:50:18] Analytics-Cluster, Analytics-Kanban: {slug} Pageview API - https://phabricator.wikimedia.org/T101792#1813341 (Tgr) Duplicate of T44259? [11:03:17] Analytics, Analytics-Kanban, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813385 (JAllemandou) Seems that this page have been widely crawled on the given days: ``` https://wikimedia.org/api/rest_v1/metrics/pageviews/per-arti... [11:30:40] Analytics, Analytics-Kanban, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813409 (JAllemandou) And from hive, all those calls where made from a single Ruby program. I have sent an email to the analytics team to discuss weit... [11:49:57] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1813435 (HYanWong) Monthly or yearly good for me too. But what might also be good (and require thought) is some measure that can be used to normalise the result... [12:38:46] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Special:EntityData usage - https://phabricator.wikimedia.org/T64874#1813479 (Addshore) [13:50:11] Analytics, Analytics-Kanban, Services: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813562 (Milimetric) I'm leaning towards removing spiders from top data. It doesn't seem of import to people looking for the "popular" pages. [14:05:31] o/ joal [14:05:37] Bummer about the job failing again. [14:06:25] Analytics-Backlog, Services: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1813571 (Milimetric) [14:09:14] Analytics-Backlog, Services: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1813576 (Milimetric) Two question for y'all. Assume we changed this: 1. Does the breaking format change bother you as consumers? It's kind of early and if the answer... [14:31:38] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1813612 (Milimetric) [14:31:49] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813613 (Milimetric) [14:34:30] joal, what do you think about trying my crazy streaming idea? [14:52:57] joal: so it looks like we have overwhelming +1s for monthly per-article data [14:57:32] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1813657 (Bluerasberry) I use this tool continuously. http://wikipediaviews.org/ It has some problems for me and the way that I use it. I would love to see the... [15:00:03] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1813665 (Milimetric) @HYanWong: > (a) the hits for the previous month, perhaps When we add monthly granularity to the per-article endpoint this would be easil... [15:05:30] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1813671 (Milimetric) @Legoktm: there is a client-side limitation of 2000 bytes because events come as GETs t... [15:13:33] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Special:EntityData usage - https://phabricator.wikimedia.org/T64874#1813683 (Addshore) Daily would be good. Grouped by format. We can probably ignore /entity/ as they should all redirect to Special:Ent... [15:21:08] hi a-team [15:21:15] morning mforns [15:21:22] hi milimetric [15:21:31] hey the varnish EL client side limit thing went up from 1000 to 2000 right? [15:21:43] milimetric, that's what I understoon [15:21:45] hiya [15:21:47] *understood [15:21:51] yeah somethign like that [15:21:55] from 1014 to 2000 [15:22:23] k, cool [15:22:30] but server side there's no real limit [15:22:53] aha [15:22:59] well, there must be some limit :] [15:23:11] well, the mysql varchar limit, whatever that's set to [15:23:16] but i saw that insert with truncation just fine [15:23:22] so it won't break, it'll just cut off [15:23:25] fyi, eventlogging in beta labs is down atm, am bringing it back up shortly... [15:23:29] thx [15:23:37] ok [15:24:10] milimetric, btw, madhuvishy sent me an email saying you were talking yesterday about where to place the wikimetrics roll up code [15:24:33] I put it inside MultiProjectMetricReport [15:24:43] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/jrm.py#L77 [15:24:46] 191 bytes [15:24:50] sorry, 191 characters [15:25:00] mforns: yes [15:25:00] I was in doubt between this and creating another ReportNode class [15:25:10] right, we should all talk when she's around [15:25:17] make sure we're on the same page [15:25:31] ok [15:25:36] the raw data for what we need comes out of MultiProjectMetricReport [15:25:57] and there could either be another node on top of that like AggregateByUserNameReport [15:26:16] she already made a top level custom node called RunGlobalReport [15:26:29] what I did is adding a parameter to MultiProjectMetricReport called group_by_user [15:26:39] and if that is true, the finish method applies the grouping [15:26:41] oh I see [15:26:55] and you can pass in arbitrary ways to aggregate there? [15:27:21] for the two examples I saw, the results would have to be "OR"-ed within each user's group of projects [15:27:31] but I'm not sure if that applies universally [15:27:32] that was the other question: you were talking about hardcoding the aggregation strategy depending on the metric [15:28:03] I don't think so [15:28:18] there are also averages and sums no? [15:28:25] per user? [15:28:49] so there's "Existing Active Editor" - that's OR, "Existing Users" and "New Users" - those are all "OR" [15:28:56] lemme look at the other ones [15:29:10] I think so, when you run bytes edited per user for example [15:32:22] it seems there's some misunderstanding [15:32:31] they don't want to be able to run arbitrary metrics [15:32:41] they have 4 specific metrics that always run [15:32:49] ok [15:32:53] they're linked here in the parent bug: https://docs.google.com/spreadsheets/d/1Qib7Nm0eyE9oMrst4cR5lHLvEg5C2P5TnpYnZhOWt4w/edit#gid=1338730435 [15:33:15] mmmmmm, I remember now [15:33:24] so when they run a "global" report, they're just looking for 4 numbers as the output [15:33:42] but the aggregation has to take into account users across wikis in certain cases [15:33:48] so, here's what we have [15:34:05] for those first 3 that I mentioned, we get results per user / per project [15:34:19] we OR the results of each user across all their projects, so we have one row per user [15:34:20] yes [15:34:31] then we SUM the output from that, and that's the number [15:34:32] gotcha [15:34:41] for that third one, I don't think we need any custom code [15:35:02] you mean the fourth? [15:35:04] but I just saw some contradiction on that page and some notes about doing something else "in the future". So we should circle back with Amanda today [15:35:07] sorry, fourth [15:35:10] ok [15:35:26] and something about "manually adding the articles improved" or something which also doesn't make sense [15:49:02] Hi guys [15:49:11] halfak: wanna talk for a minute? [15:49:30] hi joal [15:49:35] o/ joal [15:49:40] batcave or IRC? [15:49:56] batcave :) [15:50:36] You'll be the first person I interact with since 6 hours, so let's talk :) [15:50:44] halfak: --^ [15:50:44] :) [15:52:13] Analytics-Backlog, Discovery, Reading-Infrastructure-Team: Determine proper encoding for structured log data sent to Kafka by MediaWiki - https://phabricator.wikimedia.org/T114733#1813795 (Ottomata) Yeah, I'm unsure of what we should do at this point. I won't have a lot of time to work on Avro support... [15:52:19] milimetric, "Qgil moved this task to On track on the Wikimedia-Developer-Summit-2016 workboard." [15:52:27] I saw :) [15:52:33] :] [15:52:34] I was waiting for standup to \o/ [15:58:31] hola a-team [15:58:38] morning nuria [15:58:42] hola! [16:04:55] milimetric, I don't have the wiki-research-l password since the mailman-pocalypse [16:05:01] So I'm waiting on DarTar [16:05:15] You might just send the message again and we can delete the old one from the queue. [16:08:47] Analytics-Backlog, Fundraising research, Research-and-Data: FR tech hadoop onboarding - https://phabricator.wikimedia.org/T118613#1813841 (Nuria) @DarTar: Regarding hive tutorials, we can help as @madhuvishy said. Now, I would ask that before anything they read slides of our prior classes: http://commo... [16:09:48] Analytics-Backlog: Blogpost Pageview API - https://phabricator.wikimedia.org/T118866#1813842 (Nuria) [16:09:50] Analytics-Backlog: Write pageview API blogpost - https://phabricator.wikimedia.org/T118471#1813843 (Nuria) [16:17:39] joal: tests work fine, ignore my e-mail cc bd808 [16:17:51] np nuria :) [16:17:52] bd808: let me know if you still have problems [16:18:51] milimetric: can you give me today a 10 minute talk of wikimetrics new feature? [16:19:01] milimetric: whenever is good for you [16:20:49] nuria: yes, I'm talking to madhuvishy and mforns after standup about it [16:20:59] if you want you can stick around and I'll do a quick summary before [16:21:07] we get into it [16:21:23] (PS1) Addshore: Track propertycreators and bots [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253923 [16:21:41] (CR) Addshore: [C: 2 V: 2] Track propertycreators and bots [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253923 (owner: Addshore) [16:23:49] (PS1) Addshore: Set bots and propcreator scripts +x [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253924 [16:24:01] (CR) Addshore: [C: 2 V: 2] Set bots and propcreator scripts +x [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253924 (owner: Addshore) [16:25:52] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813863 (Ironholds) It's of absolutely no import; in fact, it makes things less usable. I'd already filed a task on this: T117343 [16:29:12] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813879 (Nuria) >I'm leaning towards removing spiders from top data. It doesn't seem of import to people looking for the "popular" pages. Known spiders yes, I agree, they are hard... [16:30:31] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1813880 (Nuria) [16:31:53] milimetric: yes, please, let me get teh context so i can look at tasks and do Crs [16:32:43] k. It might be useful to read the parent task: https://phabricator.wikimedia.org/T117285 [16:32:58] Analytics-Backlog, Fundraising research, Research-and-Data: FR tech hadoop onboarding - https://phabricator.wikimedia.org/T118613#1813885 (DarTar) @atgo copying you as Katie is on vacation, see first item in the task description. [16:36:26] "< Krinkle> Also, I'm not sure what to think of there being over 1M page views for enwiki Special:BlankPage" -- That page is used for icinga monitoring -- https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/hieradata/common/lvs/configuration.yaml;22535beaaf457f284a43ecc947077cd6ff76c594$128 [16:38:09] bd808: you can file a bug but easiest would be to submit a fix.See: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L77 [16:38:41] bd808: our pageview numbers (since the beginning of time) have suffered from being affected by self inflicted traffic cc krinkle [16:38:44] !log deploying and installing eventlogging master on eventlog1001 from new eventlogging python repo (this does not change code, just repo from which it is installed) [16:38:51] bd808: Is that local server health monitoring or does that go through the outside varnish layers? [16:39:16] Krinkle: well, if it ends up on webrequest table goes through varnish [16:39:48] I can imagine 1M requests in total from each apache to itself for icinga, but 1M in total on 1 day to varnish seems odd. [16:40:06] Though I guess each varnish also has monitoring, hm.. [16:41:21] Analytics-Backlog: Remove Special:BlankPage from pageviews - https://phabricator.wikimedia.org/T118958#1813899 (Nuria) NEW [16:41:34] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Deploy eventlogging from new repository. - https://phabricator.wikimedia.org/T118863#1813906 (Ottomata) Done in prod. Now to remove the deployment for the old eventlogging/EventLogging mw extension repo. [16:42:17] bd808: blank page is there: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L77 [16:42:25] bd808: those chnages are not deployed though [16:42:48] Analytics-Backlog: Remove Special:BlankPage from pageviews - https://phabricator.wikimedia.org/T118958#1813913 (Nuria) Open>Resolved [16:42:59] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1813915 (Legoktm) OK. At what length should we start truncating then? [16:43:00] Analytics-Backlog: Remove Special:BlankPage from pageviews - https://phabricator.wikimedia.org/T118958#1813899 (Nuria) This just needs deployment. [16:48:18] Analytics-Backlog: Build a public form that can hit the new API {kudu} [8 pts] - https://phabricator.wikimedia.org/T117289#1813927 (Nuria) A form hitting an API seems strange. Also if we are querying realtime result calculation is asynchronous, suited for an ajax ui but not a simple form (maybe this is what y... [16:52:29] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813942 (Ironholds) Should we read that as "we're working on this but generally we'll wait a while" or "we'll wait a while to fix this one until other bugs have the opportunity to... [17:02:17] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Puppetize eventlogging-service - https://phabricator.wikimedia.org/T118780#1813969 (Ottomata) Currently running eventlogging-service on deployment-eventlogging04 in beta labs deployed using trebuchet and configured using puppet a... [17:04:43] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1813985 (Nettrom) I'll add a +1 to wanting monthly data for a bunch of articles. Could enable some interesting changes to SuggestBot's algorithms. [17:25:48] Analytics-Backlog: Backfill data on cassandra removing spiders from top endpoint - https://phabricator.wikimedia.org/T118972#1814117 (Nuria) NEW [17:26:15] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1814125 (Nuria) [17:31:56] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1814147 (Nuria) * Special:BlankPage will not be consider a "true" page going forward; https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/... [17:32:09] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1814148 (Nuria) p:Triage>High [17:32:41] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1814150 (Nuria) p:Triage>High [17:33:01] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1814151 (Ironholds) Actually IIRC from looking at that the mass of traffic lacked a UA, so eliminating automata and spiders should do it. [17:33:43] Analytics-Backlog: Pageview API: Escape double quotes in article titles - https://phabricator.wikimedia.org/T118913#1814155 (Milimetric) [17:33:44] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1814156 (Milimetric) [17:43:52] Analytics-Backlog: Remove spider traffic from "top" results - https://phabricator.wikimedia.org/T117343#1814188 (JAllemandou) [17:43:54] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1814190 (JAllemandou) [17:45:12] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1813033 (JAllemandou) I'll start filling new data with user-only top tonight (meaning data from 2015-11-18 onward will be user-only). We'll then backfill older data later. [17:50:16] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1814217 (Addshore) p:Triage>Low [17:50:38] ebernhardson, dcausse : let's have a short meeting/hangout to talk about how to proceed with avro-camus json? [17:51:09] dcausse: yt? [17:51:15] nuria: yes [17:51:17] nuria: sure, now? [17:51:28] in an hour or so? [17:51:40] i can do that, but it's getting late for dcausse. up to him [17:51:54] let me check [17:52:06] dcausse: right, you let us know, it can also be early tomorrow (like 5pm your time)) [17:52:14] I'm alone with kids in 1 hour and it's dinner time, better in 2 hours I think [17:52:55] dcausse: let's do tomorrow then, right? so you do not feel rush [17:53:06] nuria: no worry we can do it today [17:53:18] let's say 8pm UTC? [17:53:34] ok, will set it up , let me look at ottomata's schedule [17:54:45] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814232 (Sadads) My gut instinct is the main use cases need the 300-400 character range: most tracking tools... [17:57:15] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814241 (Sadads) @Halfak can you think of a use case for anything more extensive than 400 characters? Invest... [17:58:17] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1814245 (MZMcBride) >>! In T118931#1813576, @Milimetric wrote: > Two question for y'all. Assume we changed this: > > 1. Does the breaking format change bother you as consumers? I... [17:59:16] mforns: batcav? [18:07:30] Hi all! Will it ever be possible to do something like this https://wikimedia.org/api/rest_v1/metrics/pageviews/top/wikidata/all-access/2015/11 ? (ie. not specify a specific day?) [18:07:56] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814297 (Halfak) @gpaumier did some work to extract URLs that appear in citations. I wonder if we could pro... [18:09:46] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814303 (Halfak) ``` explain externallinks; +-------------------+------------------+------+-----+---------+-... [18:13:45] !log restarted Cassandra Load - top articles daily ooziw job [18:14:22] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814310 (Halfak) I suppose we can use the `externallinks` table to answer this question. ``` > select LEN... [18:19:32] milimetric: To clarify on https://phabricator.wikimedia.org/T118931, right now the example posted there results in a fatal exception from json parsing. The code for pagview in restbase uses some kind of primitive quote-escaping logic instead of json-encoding logic – which fails to account for strings containing quotes themselves. [18:19:44] nuria: fyi am not working on friday [18:19:52] yeah, Krinkle I saw that afterwards [18:19:55] we'll have to fix that [18:19:57] So even if we keep the json-inside-json thing, we should still fix that [18:19:59] right [18:20:20] oh i see meeting, for today, that is good [18:20:23] that's def. a bug [18:20:29] though I would recommend breaking it and doing it properly. It's early enough. This is why RESTBase as versions and support levels. [18:20:32] It's experimental. [18:20:42] The longer we wait, the more damage. [18:20:56] Krinkle++ [18:24:19] ottomata: ok, set that avro meeting for today 3pm your time [18:24:26] ottomata: at dcausse 's request [18:25:52] that is good [18:29:41] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814339 (Sadads) @Halfak: the useful bit of that url though, is the id= and pg= variables: so all of the ana... [18:36:32] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1814343 (Halfak) I'm not sure that truncation is an effective strategy to prevent collecting PI -- especiall... [18:38:11] Krinkle agreed, cool. Thx [18:49:01] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1814415 (Addshore) I would prefer this be fixed asap :/ [18:50:13] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1814417 (Ottomata) Hey wait a minute, isn't this already happening? https://github.com/wikimedia/operat... [19:03:46] milimetric, madhuvishy: I want to create a page in wikitech to hold documentation for the Wikimetrics feature [19:04:11] just checking if you already created something, to avoid duplicates [19:04:29] mforns: no I haven't created anything [19:04:35] ok [19:05:20] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Migrate reqstats icinga alerts to new graphite metrics and deprecate or adapt reqstats gdash - https://phabricator.wikimedia.org/T118979#1814440 (Ottomata) NEW a:Ottomata [19:05:46] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Migrate reqstats icinga alerts to new graphite metrics and deprecate or adapt reqstats gdash - https://phabricator.wikimedia.org/T118979#1814440 (Ottomata) a:Ottomata>None [19:09:10] madhuvishy, milimetric: https://wikitech.wikimedia.org/wiki/Analytics/Wikimetrics/Global_metrics (empty) [19:09:50] Cool thanks [19:17:09] * joal is gone for diner [19:44:33] thx mforns [19:44:47] np :] [19:58:43] (PS1) BryanDavis: Fix javadoc comment for NetworkOriginUDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253958 [19:59:24] (CR) Nuria: [C: 2 V: 2] Fix javadoc comment for NetworkOriginUDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253958 (owner: BryanDavis) [20:00:59] (CR) BryanDavis: Add UDF for network origin (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253046 (https://phabricator.wikimedia.org/T118592) (owner: BryanDavis) [20:07:02] Analytics-Backlog: Track stats for outreach.wikimedia.org in pageview_hourly - https://phabricator.wikimedia.org/T118987#1814657 (Milimetric) NEW [20:23:06] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1814779 (EBernhardson) The oldest file i see is CirrusSearchRequests.log-20150726.gz. A 90 day retention... [20:46:26] Analytics, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1814896 (Ottomata) Ok, yup, bug! the job that removed old files was using ctime instead of mtime, and apparently th... [20:47:21] Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1814901 (Ottomata) [20:50:34] Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1814908 (Ottomata) Let's wait a few days and make sure this is working, then I think we can close. [20:53:07] odd thing i just saw in hive...querying the wmf.webrequest table some rows have uri_path of `http://en.wikipedia.org/w/api.php`. I suppose it could be bad clients but not sure [20:53:25] (not urgent to fix, just a curiosity) [20:53:40] Analytics-Kanban: Backfill daily-top-articles in cassandra [2015-09-01 - 2015-11-16 (included)] - https://phabricator.wikimedia.org/T118991#1814915 (JAllemandou) NEW [20:55:35] Analytics-Kanban: Wikimedia "top" pageviews API weirdness with the "Paul_Elio" article - https://phabricator.wikimedia.org/T118933#1814938 (JAllemandou) Job restarted without spiders (or at least what we identify as Spiders). Restart Day : 2015-11-17 (testing day). Backfilling is tracked using https://phabric... [20:56:23] Analytics-Kanban: Backfill daily-top-articles in cassandra [2015-09-01 - 2015-11-16 (included)] {slug} - https://phabricator.wikimedia.org/T118991#1814942 (JAllemandou) [20:56:36] hey milimetric [20:57:30] ebernhardson: There are some oddities in wmf.webrequest indeed [20:58:14] ebernhardson: I have not managed to figure out if it was coming from varnish-kafka, if data is actually really formatted in a wrong fashion, or if it's something else [21:02:56] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1814979 (GLCiampaglia) Along the same lines of what @HYanWong proposed, it would be nice if it could be possible to specify a list of articles and the API could... [21:10:42] nuria: [21:10:42] https://gerrit.wikimedia.org/r/#/c/254030/ [21:10:42] https://gerrit.wikimedia.org/r/#/c/254031/ [21:10:47] +1s please :) [21:10:49] yess [21:11:32] ottomata: thank you! [21:14:28] Analytics-Backlog: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1815062 (Milimetric) >> 2. Is the current format OK if we just decode it and present it uniformly as JSON? > > I don't understand why the data was (poorly) double-encoded in the fi... [21:21:14] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1815102 (Milimetric) I forgot to follow up on this, btw. There's another restriction I found in the EL mysq... [21:22:55] nuria, docs updated too, fyi [21:23:03] ottomata: many thanks sir [21:32:26] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1815138 (Sadads) >>! In T115119#1814343, @Halfak wrote: > I'm not sure that truncation is an effective strat... [21:39:49] madhuvishy: yt? one question... [21:39:55] nuria: yeah [21:39:59] tell me [21:40:24] madhuvishy: when we run camus last, we did not need anything beyond properties file to make it write data into hdfs , right? [21:40:53] ummm, we had some jars [21:41:02] but that's it [21:41:22] nuria: ^ [21:41:32] madhuvishy: k [21:41:45] madhuvishy: let me try my 1st test and i will let you know [21:41:57] nuria: ok cool, let me know if you need any help [22:24:40] madhuvishy: mforns: I'm heading to dinner in a few minutes [22:24:51] lemme know if you need anything [22:25:08] (not from the dumpling place I'm going to) [22:25:18] :) [22:25:40] I think I'm good, running into some weird error but i can figure it out :) [22:25:48] have fun at dinner [22:27:52] thx, have a nice night [22:34:01] nuria: (not urgent, but:) just read https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Troubleshooting and it says this: [22:34:06] "If your job is finished, you can find all of the job logs in HDFS at: /var/log/hadoop-yarn/apps/USER/logs/application_id/*" [22:34:36] ..but /var/log/hadoop-yarn/ is empty [22:34:40] milimetric, ok :] [22:34:46] see you tomorrow! [22:41:34] HaeB: ya, it might be old info, let me see about the new path [22:42:41] HaeB: i think things are here: /mnt/hdfs/var/log/hadoop-yarn/apps/ [22:42:57] HaeB: if you can update wiki it will be great [22:44:05] HaeB: wikitech doesn't load for me [22:45:23] HaeB: nevermind, just updated it [22:45:50] nuria: cool, thanks! [22:47:16] madhuvishy: I think i am running into issues due to the checks that joal added recently [22:47:21] https://www.irccloud.com/pastebin/e1ojwJn0/ [22:50:19] Doesn't seem related nuria :) [22:51:05] camus and checker are two distinct program, and the log lines you are showing seem to come from camus [22:51:07] nuria: --^ [22:51:39] joal: man it is kind of late, do not worry, i think i need to create a fake offset file [22:51:45] on tmp/nuria/history/2015-10-05-18-32-57/offsets-previous [22:52:17] depends on what you are trying :) [22:54:23] joal: we can talk about it tomorrow, no worries [22:54:31] k muria :) [22:54:35] nuria sorry [23:14:00] madhuvishy, yt? [23:14:15] mforns: hey yes [23:14:44] hey, do you get errors when executing wikimetrics tests? [23:14:49] in master I mean :] [23:15:46] mforns: oh, I never tried [23:15:48] let me [23:16:03] ok thanks [23:16:45] I get 3 errors, and when I try it with my code, the tests I added pass, but another test fails, the thing is, the code I wrote is totally independent from the rest of the tests... [23:24:26] mforns: sorry i recently destoryed my vagrant, so it seems to be installing some dependencies from scratch [23:24:45] madhuvishy, don't worry hehe...vagrant [23:26:02] (PS1) Nuria: [WIP] Adding test schema to test json->avro publishing [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254066 [23:26:36] Analytics-Backlog, Team-Practices, User-JAufrecht: Get regular traffic reports on TPG pages - https://phabricator.wikimedia.org/T99815#1815469 (JAufrecht) [23:28:55] madhuvishy, I will push my change anyway with WIP so that we can sync both changes. There's a detail about the format in the results we can talk tomorrow [23:29:09] mforns: okay cool [23:29:38] (PS1) Mforns: Add aggregate by user report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287) [23:30:18] (CR) Mforns: [C: -1] "Still WIP" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287) (owner: Mforns) [23:30:44] madhuvishy, here is it ^ [23:30:49] a-team, see you tomorrow! [23:32:32] good night mforns [23:32:43] btw on the tests i get something like [23:32:47] https://www.irccloud.com/pastebin/aDia4zTO/ [23:32:48] aha [23:33:03] dont know what's this nosetests thing [23:33:11] but we can talk about it tomorrow [23:33:16] ok [23:33:28] you seem to get the same 4 failures [23:33:40] ok, we talk tomorrow, thanks! [23:33:44] ok :) [23:33:50] bye :] [23:33:55] cya [23:54:28] nuria: would you or somebody on your team have an hour tomorrow or Friday to talk me through the next steps for my api analytics process? I think I have the major parts outlined at https://www.mediawiki.org/wiki/User:BDavis_%28WMF%29/Projects/Action_API_request_analytics#Data_acquisition but I'm not sure of where to put all the pieces and haven't managed to find a good template project to follow from looking at wikitech and phabricator. [23:54:55] I'd be more than happy to start such a top to bottom tutorial as part of paying back for the hand holding [23:57:18] Analytics-Backlog, Fundraising research, Research-and-Data: FR tech hadoop onboarding - https://phabricator.wikimedia.org/T118613#1815565 (atgo) Sure - the list of fr-tech engineers is: agreen, awight, cdentinger, dkozlowski, eeggleston. I'd also love any additional tutorials that are available. Who i...