[01:48:12] Analytics-EventLogging, MediaWiki-extensions-NavigationTiming, Performance-Team, operations, Patch-For-Review: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1673362 (ori) Open>Resolved a:Krinkle [07:07:02] Analytics-Engineering, wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1673629 (Qgil) NEW a:Aklapper [07:07:16] Analytics-Engineering, wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1673629 (Qgil) a:Aklapper>None [07:07:32] Analytics-Engineering, wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1673629 (Qgil) [08:15:23] Analytics-Engineering, Analytics-Wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1673726 (Nemo_bis) [10:03:57] Analytics-Tech-community-metrics, Developer-Relations, DevRel-September-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1674023 (Qgil) Less contributors in MediaWiki core, presumably now working in other r... [10:49:19] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1674109 (Aklapper) [10:52:46] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1674118 (Aklapper) p:Low>High [10:55:41] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Patches with Verified -1 should not be counted as open in our code review metrics - https://phabricator.wikimedia.org/T108507#1674132 (Aklapper) [11:40:53] Analytics-Tech-community-metrics: Code review time must be on merged patches, not closed ones - https://phabricator.wikimedia.org/T68265#1674205 (Qgil) I think our current approach of measuring median age of open changesets is good, because it shows the dimensions of the problem we have. A team might have a... [11:41:41] Analytics-Tech-community-metrics, DevRel-November-2015: Improve Key performance indicator: code contributors new / gone - https://phabricator.wikimedia.org/T63563#1674206 (Qgil) [11:42:55] Analytics-Tech-community-metrics, Developer-Relations, DevRel-November-2015: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1674209 (Qgil) [11:43:15] Analytics-Tech-community-metrics, DevRel-November-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1674210 (Qgil) [11:46:11] Analytics-Tech-community-metrics, DevRel-November-2015: Clicking "Age of open changesets by Affiliation" explanation link goes to top of page - https://phabricator.wikimedia.org/T110874#1674212 (Qgil) p:Lowest>Low [11:46:40] Analytics-Tech-community-metrics, DevRel-November-2015: " Age of open changesets by Affiliation" layover partially displayed off-screen - https://phabricator.wikimedia.org/T110873#1674215 (Qgil) p:Lowest>Low [11:47:59] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1674220 (Qgil) Just putting a vague date / deadline to reach to a conclusion on this discussion. [11:48:10] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1674222 (Qgil) p:Lowest>Low [11:50:24] Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1674227 (Qgil) I wonder how different would this extension be to the Education Program extension. Maybe we can just merge this task to {T9... [13:08:36] Analytics-Tech-community-metrics, Research consulting, Research-and-Data: Data for audit report - https://phabricator.wikimedia.org/T110067#1674360 (ezachte) No, can be closed I think. Doing it. [13:08:49] Analytics-Tech-community-metrics, Research consulting, Research-and-Data: Data for audit report - https://phabricator.wikimedia.org/T110067#1674361 (ezachte) Open>Resolved [13:27:57] Analytics-Tech-community-metrics, DevRel-November-2015: "Age of open changesets by Affiliation" layover partially displayed off-screen - https://phabricator.wikimedia.org/T110873#1674404 (Aklapper) [13:36:52] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1674454 (ezachte) Only pageviews readily available. I'll see it can find some other data. Pageviews to English Wikipedia: since Jan 2008 when we started to count these: 623 billion minus up to 30%... [13:50:24] Analytics-Engineering, Analytics-Wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1674496 (JanZerebecki) One roadblock to merging them is that that the CI might not work anymore: {T113725} I might be able to give a "+1 I wou... [14:19:05] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1674593 (Halfak) @Kevinator, can you take a look at this to see if you or someone else in #Analytics knows of other resources that would be available for @EdErhart-WMF //et al.//? [14:24:37] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1674601 (Nuria) @tgr: confirmed with brandon that there is no access to POST parameters for logging but in the case of API request it seems strange that we are... [14:24:41] did this per otto as he is afk https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Transition_to_Active and there are some issues, not sure how to confirm all is well with name node [14:27:08] Analytics-Backlog, Research consulting, Research-and-Data: Analysis on traffic through the HTTPS transition - https://phabricator.wikimedia.org/T102431#1674610 (Aklapper) This task has "Unbreak Now!" priority for three months now which means [[ https://www.mediawiki.org/wiki/Phabricator/Project_managem... [14:32:31] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1674629 (Anomie) >>! In T108618#1674601, @Nuria wrote: > (now, I do not know much about how does the api work) It works by a GET or POST to /w/api.php with ne... [14:35:34] hi a-team! [14:35:44] Heya mforns :) [14:36:01] morning [14:38:51] Hi milimetric [14:47:13] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1674708 (Nuria) >Is there a way to get a unique identifier to the varnish log in MediaWiki code? Otherwise that data is not going to help much. No, data is lo... [14:53:47] Hey chasemp [14:54:09] Just saw ottomata email and your answer [14:54:13] Morning! [14:54:48] I am trying to double check if I can access the cluster [14:55:03] Ok thanks [14:56:34] chasemp: cluster is accessible, we have some lag in computation but nothing broken :) [14:56:42] chasemp: cluster is accessible, we have some lag in computation but nothing broken :)! [14:57:03] Thanks for the fast reaction, and sorry for double-lining [15:04:49] Cool and no worries :) [15:07:09] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1674755 (Anomie) >>! In T108618#1674708, @Nuria wrote: >>Not when all the useful information is in the POST body. > True, but I thought this ticket was about u... [15:25:17] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1674817 (ezachte) Here is a fact ( and probably best dump related number to impress people ): the largest file is the full archive (with all revisions and raw text for each revision) enwiki-20150901... [15:28:25] (PS3) Christopher Johnson (WMDE): adds sparql lookup function for metric metadata removes markdown files modifies owl to make legal uris [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/240758 (https://phabricator.wikimedia.org/T113180) [15:31:55] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds sparql lookup function for metric metadata removes markdown files modifies owl to make legal uris [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/240758 (https://phabricator.wikimedia.org/T113180) (owner: Christopher Johnson (WMDE)) [15:32:01] milimetric: standup? [15:32:02] milimetric: standuppppp ! [15:33:43] Analytics-Cluster, Analytics-Kanban, Epic: {bear} Last Access Counts - https://phabricator.wikimedia.org/T88647#1674848 (kevinator) [15:33:55] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Regularly purge EventLogging data in Hadoop {stag} [8 pts] - https://phabricator.wikimedia.org/T106253#1674849 (kevinator) Open>Resolved [15:33:56] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1674850 (kevinator) [15:38:10] Analytics-Kanban, Patch-For-Review: Make reportupdater support script execution [8 pts] {crow} - https://phabricator.wikimedia.org/T112109#1674862 (kevinator) Open>Resolved [15:38:43] Analytics-EventLogging, Analytics-Kanban: Research sending EventLogging validation logs to Logstash {oryx} [5 pts] - https://phabricator.wikimedia.org/T111412#1674866 (kevinator) Open>Resolved [15:58:37] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1674904 (Milimetric) @Fjalapeno: since it's the end of the quarter we're trying to fight for a piwik instance... [16:00:19] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1674906 (Addshore) +1 for getting the api logging info through kafka and into hadoop! I was speaking to people about this a few days ago before stumbling into... [16:00:39] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1674908 (BGerstle-WMF) @milimetric we're definitely interested. not sure if it's related, but we're also follo... [16:04:15] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1674918 (ezachte) By itself a word count for the dump with current article revisions only is trivial with wc tool, but this will gravely overcount. Thus Wikistats only counts words in raw article tex... [16:05:36] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1674934 (Fjalapeno) @milimetric yeah as Brian said we are definitely interested in getting this going. So if y... [16:18:29] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1674985 (ezachte) References: I never counted those references, but with a grep and line count we have a reasonable lower bound: 18 million. bzgrep -P "<\/?ref>" enwiki-20150901-pages-meta-cu... [16:23:51] Hey milimetric [16:24:14] hi [16:24:19] What about creating a few tasks to build a druid query module for restbase ? [16:24:32] Is that too early ? [16:24:48] joal: yeah, i think that's too early [16:24:58] ok :) [16:25:12] let's try to wrap everything up this quarter, put a nice bow on it with maybe a little cherry on top [16:25:30] and then plan for next quarter, have all those discussions about strategy that some of us want to have badly and some of us not so much :) [16:25:41] after that, I think we're free to start working on it [16:25:52] but that's just my opinion [16:25:55] ok, I am not very good at bows, but I'll try to help [16:28:06] I'm the worst at that. My dad always made fun of me when I was little, that I never finished anything. He said I should think like a shark. When I smell blood, go in for the kill. My dad had violent metaphors sometimes, but it's a good skill to work on, finishing things all the way [16:29:31] violent indeed [16:29:35] but to the poiunt ! [16:30:11] about strategy talk, I really hope we'll make a good plan on the many things we want to do :) [16:30:35] anyway, goign to meet with kevin, then weekend ! [16:30:44] Have a good one, and see you all on monday ! [16:46:28] have a nice weekend joal! I like the idea of druid from restbase, and unifying both :]. But I totally agree in discussing this without rush with the team. cc: milimetric [16:56:09] (CR) JanZerebecki: [C: 2 V: 2] Add sample cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240654 (owner: Addshore) [17:09:16] (CR) JanZerebecki: "Perhaps selfmerge so I don't need to review it for security bugs?" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240653 (owner: Addshore) [17:09:24] (CR) JanZerebecki: Script for tracking site_stats over time [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240652 (owner: Addshore) [17:10:26] Sorry a-team my connection dropped during standup. [17:12:24] sok madhuvishy, we were talking about druid and were kind of on the same page that we'd work with them to sign NDAs but defer any work on setting up druid to next quarter [17:16:36] Oh cool [17:19:32] Analytics-Backlog, Research consulting, Research-and-Data: Analysis on traffic through the HTTPS transition - https://phabricator.wikimedia.org/T102431#1675396 (ellery) @Aklapper this task is complete. [17:34:45] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1675459 (kaldari) Any update on this? [17:35:21] Analytics, ImageMetrics, Multimedia, Sentry, Patch-For-Review: Measure how many users have CORS-hostile proxies - https://phabricator.wikimedia.org/T507#1675468 (Tgr) [18:24:31] Analytics-Tech-community-metrics: Data in korma project pages has confusing labels, is difficult to understand - https://phabricator.wikimedia.org/T110524#1675705 (Aklapper) p:Triage>Normal [18:25:00] Analytics-Tech-community-metrics: Data in korma project pages has confusing labels, is difficult to understand - https://phabricator.wikimedia.org/T110524#1579875 (Aklapper) @dicortazar: This needs from Bitergia to better explain those strings and what they mean. [18:31:28] Analytics-Backlog, Discovery, Discovery-Analysis-Sprint: Display automata and humans separately on zero results rate graph - https://phabricator.wikimedia.org/T112846#1675756 (Deskana) [18:31:43] Analytics-Backlog, Discovery, Discovery-Analysis-Sprint: Display automata and humans separately on zero results rate graph - https://phabricator.wikimedia.org/T112846#1647865 (Deskana) Additional infrastructure is needed to do this. That is documented in T103505 and associated tasks. [18:44:40] milimetric: How are things going with the pagestats API? Do you feel like it's going to ready by end of September? Just asking since we are trying to decide whether to wait for it or move ahead with existing data sources. [18:45:26] kaldari: it's going well, all the code except the very front-end configuration is ready. Hardware is ready, puppet is nearly done [18:45:51] I don't see any obstacles at this point, but the end of the quarter could add some time where we don't get a hold of people / general lag [18:46:25] that's good news. if you had to place a bet on the day it went live, what day would you bet on? [18:46:28] kaldari: I don't remember right now what data exactly you needed [18:46:50] milimetric: we just need the mobile pageview data, which is available in the dumps as well [18:47:18] kaldari: right, maybe a better question is: how are you going to use it? [18:47:36] We were going to add the data to Mr.Z-bots popular pages reports [18:47:37] from the front-end interface, or batch processing and indexing in the background? [18:48:19] oh, ok, so you'd be using the "top" endpoint? I just wanna take a minute then to make sure the first version of the API would meet your needs [18:48:21] we would have to batch process and index it, so it would be kind of a pain [18:48:53] so our top endpoint right now looks like this: /top/{project}/{access}/{year}/{month}/{day} [18:49:36] so you could say, give me the top pages on en.wikipedia, seen from mobile devices, in 2015 [18:49:46] or in August 2015, or August 12th 2015 [18:49:58] but that's about it for the first version [18:50:21] milimetric: could we specify a range or number of days in the past? [18:50:46] not right now, because this data is pre-aggregated so it's hard to pre-aggregate arbitrary ranges [18:51:16] so the next version of the API would support that, but we're not sure when we're doing that [18:51:50] milimetric: ah, that's really good to know. thanks! [18:52:13] kaldari: so instead of using what's on dumps and setting up your own processing infrastructure, you could use Hadoop and process there [18:52:47] have you poked around Hive? [18:52:54] no, what's that? [18:53:06] we have wmf.pageview_hourly which is hourly aggregate data in a table you can basically query with SQL [18:53:35] so how many "top viewed pages" type queries would you be issuing and how often? [18:53:59] are there any higher aggregations like day or month? We're specifically interested in month. [18:54:28] we don't pre-aggregate that because the hourly aggregate data is already small enough to run most queries on it [18:54:47] Basically, once a month we would be aggregating the monthly page views for pretty much all the articles on en.wiki [18:54:47] so you'd go "top pages in month X" and how often would you run that query? Once a month for each project? [18:54:56] oh, just en.wiki [18:55:12] milimetric: yeah, for now, since the existing bot is en.wiki only [18:55:12] kaldari: wait, that's different from "top pages" :) [18:55:32] you mean you'd aggregate that and then find the top pages separately? [18:55:36] yes [18:55:47] we would find the top 100 pages for each WikiProject [18:56:03] oh, ok. Because the first version of the API will support per-article pageviews for arbitrary time ranges. It'll return daily data but you can aggregate [18:56:31] milimetric: Like this: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Feminism/Popular_pages#List [18:57:08] the existing bot uses the dumps and doesn't have mobile traffic [18:57:11] so the per-article stats will be available like this: /per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end} [18:57:55] note the {start} and {end} params, and {granularity} which you could set to daily [18:58:24] So I could do /per-article/enwiki/all-access/all-agent/Taylor_Swift/???/Oct_1/Oct_31? [18:58:26] so you'd call that for all the pages you need with start->end being the month you need, and sum [18:58:48] yes, except enwiki would be en.wikipedia.org and ??? would be daily [18:59:01] cool, that's not bad [18:59:10] and Oct_1 would be 2015100100 [18:59:14] and that would be available in the first version of the API? [18:59:28] yes, you said that above [18:59:37] yes, first version [18:59:46] and if I had to bet, maybe Oct 9th? [19:00:13] OK, I think that's enough info for me to decide to wait then :) [19:00:17] I'd bet like $10 on that if I got decent odds [19:01:07] thanks for taking the time to explain all that to me! [19:01:12] np, happy to [20:47:07] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1676472 (mforns) @jcrespo Sorry for the confusion, but I've added some lines to the white-list file. {F2635069} This is the valid version. The other version is now incomplete. If this happens again, I'll... [20:52:37] Analytics-Kanban: Make Logstash consume from Kafka:eventlogging_EventError - https://phabricator.wikimedia.org/T113627#1676483 (mforns) a:mforns [20:55:59] Analytics-Cluster, Analytics-Kanban: Python Aggregator: Solve inconsistencies in data ranges when using --all-projects flag {musk} - https://phabricator.wikimedia.org/T106554#1676493 (mforns) [21:04:36] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1676538 (Milimetric) Cool, we'll keep mentioning that it's important to real people :) The scalable event sys... [21:15:18] Analytics-Kanban, RESTBase-API: create RESTBase endpoints [34 pts] {slug} - https://phabricator.wikimedia.org/T107053#1676566 (Milimetric) [21:43:31] Analytics-Backlog: What is the total number of users to make an edit as of Sept 15 - https://phabricator.wikimedia.org/T113808#1676640 (kevinator) NEW [21:49:23] Analytics-Backlog: What is the total number of users to make an edit as of Sept 15 - https://phabricator.wikimedia.org/T113808#1676654 (kevinator) It should be straightforward enough to count how many users made at least one edit. However, IP edits are a tricky matter. Many people will have edited using the... [22:21:50] Analytics-Engineering, Analytics-Wikistats, DevRel-October-2015: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1676786 (Tgr) > There is probably a checkout of that project on stat1002.eqiad.wmnet under /srv/wikistats_git but its creation and update is no... [22:39:02] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1676830 (Tgr) NEW [22:39:07] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1676837 (Tgr) Approach 1: add Varnish request id (set by Varnish in the X-Varnish header) to MediaWiki * can these be made unique? right now they have 9 digits; given th... [22:41:08] Analytics-Backlog, MediaWiki-API, Research-and-Data, Patch-For-Review: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1676839 (Tgr) >>! In T108618#1674708, @Nuria wrote: > No, data is logged on a per request basis, now, if you look at your data in short enough time intervals y... [22:43:17] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1673115 (kevinator) @EdErhart-WMF if you want pageviews for the last year... you can use a number which was previously calculated and was published in court filings: See Page 27, Paragraph 87 of this... [23:10:13] good night team! nice weekend [23:33:24] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1676915 (Tgr) Also, this will only make sense of joining data from different tables is easy / efficient enough. I'm not familiar with Hive so not sure if that's the case. [23:33:45] Analytics, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1676916 (Tgr)