[00:12:18] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1555670 (DarTar) I suggest we ping Toby/Jon and maybe Lisa/Sheree to understand what kind of data would be most useful to them. For example, I wonder if people... [00:43:11] (CR) Milimetric: [C: 2] "I manually applied this latest version of the script to the /a/geowiki/scripts repo on stat1003, and it worked ok, so I'm self-merging. I" [analytics/geowiki] - https://gerrit.wikimedia.org/r/232426 (https://phabricator.wikimedia.org/T106229) (owner: Milimetric) [00:45:01] Analytics-Kanban, Patch-For-Review: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1462437 (Milimetric) Moving this back to read-to-deploy until I figure out if the repository updates and all that is working. Running the script manually gave me some errors. [01:22:02] Analytics-Backlog, MediaWiki-API, Wikipedia-iOS-App-Product-Backlog, Wikipedia-Android-App: Add page_id and namespace to X-Analytics header in App / api requests - https://phabricator.wikimedia.org/T92875#1555872 (kevinator) [01:25:44] Analytics-Backlog, Team-Practices-This-Week: Get regular traffic reports on TPG pages - https://phabricator.wikimedia.org/T99815#1555883 (kevinator) @Jaufrecht This is not done / automated. From your email, I had the impression it was not immediately clear that this data was valuable and needed ongoingly... [09:16:28] (CR) Joal: [WIP] Generate hourly aggregate statistics about webrequest sequence stats (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/232644 (owner: Ottomata) [09:30:00] Ironholds: knowing your c++ acquaintance: https://twitter.com/aris_ada/status/634016666742034432 [09:39:26] Analytics-Dashiki: vital-signs doesn't display pageviews graph - https://phabricator.wikimedia.org/T109693#1556539 (Nemo_bis) NEW [09:39:35] Analytics-Dashiki: vital-signs doesn't display pageviews graph - https://phabricator.wikimedia.org/T109693#1556546 (Nemo_bis) [13:06:42] Hey ottomata [13:06:53] Everything went ok yesterday at reinstall ? [13:07:04] yup! A-ok [13:07:10] Cool [13:07:16] i have interview this morning, will do an18 this afternoon [13:07:22] Great [13:07:50] Let me know when you start, like that I look from time to time [13:08:02] wow, so many X's [13:08:05] Data recomputed for the 17th [13:08:13] coool [13:08:29] Plus some failed jobs yesterday around 16UTC [13:09:28] Also, I have refered one guy for our open position --> I think he really is THE one we are after :) [13:10:21] haha [13:10:22] OH! [13:10:23] ok :) [13:10:40] why OH ? [13:10:44] oh, i want to sync with you about your interview, not to get your opinion yet, but to make sure i don't ask the same things [13:10:58] sure, batcave (I'll test my new headset:) [13:11:06] ha k [13:11:40] joal, be there in 5... [13:11:48] sure [13:13:20] ottomata: About the Xs, do you want to rerun load, to check ? [13:19:34] joal, no need to rerun load, X's are caused by duplicates [13:19:43] ok ottomata [13:19:44] i just ran my query (from that WIP patch) on day > 18 [13:19:52] Aesome [13:20:00] I have commented on your patch [13:20:04] occasionaly miniscule loss [13:20:08] ja saw, they are good comments [13:20:34] joal: i have hardly thought about this interview yet, let me collect some thing about previous interviews I have done, gimme another 5ish mins [13:20:37] Thx :) [13:20:47] No problem at all :) [13:20:53] ping me when ready [13:24:48] ok joal in batcave [14:05:47] joal, it is interseting that an18 and an21have a higher steady rate of disk reads! [14:06:04] between those and 1012 and 1022 is precise vs jessie [14:06:17] i wonder if newer linux kernel does smarter file caching or something :) [14:07:14] also, fyi, the higher iowait is because an21 does not have hyperthreading on [14:13:52] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1557085 (Milimetric) [14:32:23] Analytics-Tech-community-metrics: Handling multiple affiliations in tech community metrics - https://phabricator.wikimedia.org/T95238#1557111 (Aklapper) Do we have an idea about how many affected users we talk here (probably not, so "potentially everybody who we also suppose to work for some company")? Whil... [14:43:58] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1557133 (Milimetric) Ok, I used two days and downsampled more, this time 0.001 percent. Here are the results: * There were a little over 3.3 million log line... [14:48:17] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1557146 (Milimetric) Also, super weird data, like one of the top referers is http://www.j0zf.com/includes/plugins/blogs/images/yahoo_favicon.png What the hell... [14:51:43] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1557150 (ezachte) It seems Pau's interface (T92502) mostly focuses on highly aggregated information for WMF dashboards. There is also a need for continuation of (some) of the current much more detailed... [15:08:00] ottomata: Very interesting :) [15:32:06] ottomata: standup ? [15:32:19] no, in interview, sent email [15:33:03] sorry [16:00:28] joal, may be 5 minutes late to this meeting [16:00:35] the NSA stuff is inundating my inbox [16:00:40] ok [16:00:43] :) [16:00:54] Good luck with inundations [16:05:15] Analytics-Kanban, Research-and-Data: Legal request, data point 1 - https://phabricator.wikimedia.org/T109626#1557486 (Milimetric) a:Milimetric [16:06:45] joal, I am now back [16:06:50] you are not and I have a great view of your office [16:21:47] joal: what would you expect select sum(if(is_pageview,1,0)) from wmf.webrequest TABLESAMPLE(0.01 PERCENT) s where year=2015 and month=8 and (day=1 or day=2); to be? [16:23:40] milimetric: 2 days of data, in weekends, divided by 100 --> about 4M I think [16:23:49] it's 0 [16:24:35] issue somewhere [16:24:39] changing the sample to 0.1, I get 306708 [16:24:46] which makes a little more sense. Hm.... [16:24:51] it doesn't make a lot of sense thoug [16:24:59] hm, not really sense no [16:25:08] well, *more* than 0 :D [16:25:13] true [16:25:36] oh ! 0.01 percent ~ diveide 10000 ! [16:25:38] not 100 [16:26:55] So 0.1% valeu makes sense yeah [16:27:05] no, I don't think so [16:27:11] I did count(*) for the 0.1% sample [16:27:17] 32176904 [16:27:32] so 306708 / 32176904 means like .9 % of our requests are pageviews [16:27:36] that... can't be right [16:27:48] That's the kind of things we have though [16:28:05] you mean 0.9% of our requests are pageviews? [16:28:13] like, this is a steady number over time? [16:31:07] We don't compute that rate regularly [16:31:14] It would be a good metric to have though [16:31:32] Just as a check: SELECT SUM(case when is_pageview then 1 else 0 end) pv, count(1) from wmf.webrequest where webrequest_source in ('mobile', 'text') and year = 2015 and month = 8 and day = 19 and hour = 19; [16:31:38] pv _c1 [16:31:38] 28848429 240167744 [16:31:53] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1557576 (Milimetric) Pau's original task was definitely more restricted than "all of the data" :) @violetto is working with us now to extend Pau's designs to the broader scope. Thanks for the wiki pag... [16:32:02] More about 10% when taking mobile and text partitions only [16:32:08] milimetric: --^ [16:32:14] joal: retro [16:32:19] yup, arriving [16:49:35] Does that work a-team ? [16:49:45] i got pinged! [16:49:47] yeah, I got the ping [17:05:36] same [17:05:52] Can someone call the team ? [17:05:59] for my test [17:06:03] pliz [17:09:52] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1557820 (Milimetric) [17:12:14] Analytics-Backlog: Add better regexp to agent_type bot filtering {hawk} - https://phabricator.wikimedia.org/T108343#1557824 (JAllemandou) [17:13:18] Analytics-Backlog: Add a 'Guard' job for pageviews {hawk} - https://phabricator.wikimedia.org/T109739#1557827 (JAllemandou) NEW [17:13:29] Analytics-Backlog: Delete all data from EventLogging:PersonalBar schema {tick} - https://phabricator.wikimedia.org/T105065#1557836 (mforns) Open>declined [17:13:30] Analytics-EventLogging, Analytics-Kanban: {tick} Schema Audit - https://phabricator.wikimedia.org/T102224#1557837 (mforns) [17:14:19] Analytics-Backlog: Add a 'Guard' job for pageviews {hawk} - https://phabricator.wikimedia.org/T109739#1557842 (kevinator) p:Triage>High [17:16:10] Analytics-Backlog, RESTBase: Create a metric for overall RESTBase request rates from Varnish logs - https://phabricator.wikimedia.org/T109547#1557857 (kevinator) p:Triage>High [17:20:06] Analytics-Backlog, Reading-Admin, Wikipedia-Android-App: Update definition of page view and implementation for mobile apps - https://phabricator.wikimedia.org/T109383#1557874 (kevinator) p:Triage>High [17:20:47] Analytics-Backlog, Reading-Admin, Wikipedia-Android-App: Update definition of page view and implementation for mobile apps {hawk} - https://phabricator.wikimedia.org/T109383#1557876 (kevinator) [17:25:18] Analytics-Cluster, Analytics-Kanban: Add hourly aggregate sequence stats creation to webrequest load job - https://phabricator.wikimedia.org/T109136#1557900 (JAllemandou) [17:28:16] Analytics-Backlog: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest - https://phabricator.wikimedia.org/T108843#1557919 (kevinator) p:Triage>Normal [17:29:00] Analytics-Backlog: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1557928 (kevinator) p:Triage>High [17:29:06] Analytics-Backlog: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1557929 (kevinator) p:Triage>High [17:29:13] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1557939 (kevinator) p:Triage>High [17:32:04] Analytics-Tech-community-metrics: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1557951 (Dicortazar) Thanks for this update! I'll check that list and update those channels in Korma. @Dzahn and @Aklapper, I'm still working on the feature to automate the list of repositories to... [17:34:19] joal: I added you as an optional guest to the pageview api meeting tomorrow [17:34:37] i'm not sure it's going to happen, gabriel proposed it, we'll see if the others are ok with it [17:34:57] thx milimetric [17:35:44] I can make it, so if it's confirmed I'l be there [17:35:59] hmf, nobody helped me test my warn [17:36:06] a-team, pliz, can you call ? [17:36:17] ? [17:36:23] call? [17:36:25] warn? [17:36:29] a-team [17:36:30] you've just confused the whole team! [17:36:31] :) [17:36:34] :D [17:36:44] Thx mforns, you got it ! [17:36:46] and it rang [17:36:57] Sorry, I won't do that again [17:37:12] with great pinging comes great responsibility [17:37:31] True, I'll be more diligent with my pinging [17:55:25] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1558013 (Milimetric) This next one is sampled 0.5 % but filtered to only look at pageviews. * 8,162,386 lines analyzed * 10348 unique referer hosts * 715027 u... [18:00:54] Analytics-Kanban: {pika} Proactive Pageview Definition - https://phabricator.wikimedia.org/T109745#1558031 (Milimetric) NEW [19:03:35] Analytics-Tech-community-metrics: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1558174 (Qgil) (let's keep the discussion of this task focused on IRC channels, not mixing it with code repositories) [19:51:02] (PS2) Mforns: Refactor queries into reportupdater and dashiki [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) [19:53:04] (CR) Mforns: "I removed the -1, I think this is ready to deploy." [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [19:53:48] Guys, I'm finished for today [19:54:01] See you tomorrow a-team :) [19:54:13] ciao [19:54:14] hey :] good night, see ya! [20:07:13] (CR) Milimetric: [C: 2] "Marcel, I prefer the solution you went with compared to a wikis.txt file, because this way is easier to understand / change by the people " [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [20:08:03] milimetric, thx [20:08:51] mforns: I'll leave it up to you to deploy / figure out how the files work, I'm starting to have pre-vacation pile up :) [20:09:07] milimetric, sure! [20:09:15] have a nice long weekend! [20:16:32] (PS1) Ori.livneh: Add statsvr, a reverse statsv [analytics/statsv] - https://gerrit.wikimedia.org/r/232836 (https://phabricator.wikimedia.org/T109753) [20:18:55] vrrrr [20:18:56] :) [20:19:04] (CR) Krinkle: Add statsvr, a reverse statsv (1 comment) [analytics/statsv] - https://gerrit.wikimedia.org/r/232836 (https://phabricator.wikimedia.org/T109753) (owner: Ori.livneh) [20:20:16] (CR) Ori.livneh: Add statsvr, a reverse statsv (1 comment) [analytics/statsv] - https://gerrit.wikimedia.org/r/232836 (https://phabricator.wikimedia.org/T109753) (owner: Ori.livneh) [20:20:21] (PS2) Ori.livneh: Add statsvr, a reverse statsv [analytics/statsv] - https://gerrit.wikimedia.org/r/232836 (https://phabricator.wikimedia.org/T109753) [20:20:56] (PS3) Ori.livneh: Add statsvr, a reverse statsv [analytics/statsv] - https://gerrit.wikimedia.org/r/232836 (https://phabricator.wikimedia.org/T109753) [20:29:24] Analytics-Backlog, Reading-Admin, Wikipedia-Android-App: Update definition of page view and implementation for mobile apps {hawk} - https://phabricator.wikimedia.org/T109383#1558515 (kevinator) adding the thread on the analytics list for reference: https://lists.wikimedia.org/pipermail/analytics/2015-Au... [20:36:34] Analytics-Backlog: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1558541 (mforns) @Springle - Do you think this is possible? - Does it make sense to do it like this? - Can we write the SQL script and send it to you for CR? Thanks! [20:40:24] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1558553 (mforns) @Springle - Do you see any difficulty here with the white-lists? - IIRC the current purging attempt was implemented with black-lists. What is better from a DBA perspective, white- or black-... [21:34:51] hello milimetric. I was wondering how is your schedule today. Do you think we should look into one of the tickets? (just asking :) [21:35:18] hi leila: not today, sorry, secret reasons, will pm [21:35:28] np. thanks for letting me know. [22:11:52] Analytics-Backlog, Research consulting, Research-and-Data: Scope effort needed to validate token-based Unique Client data - https://phabricator.wikimedia.org/T107239#1559047 (leila) @DarTar, is this done given that Kevin and Michelle came to the RG meeting couple of weeks back, or you would like to kee... [22:21:19] Analytics-Backlog, Discovery: Present Google Referrer Traffic analysis at Metrics meeting - https://phabricator.wikimedia.org/T109773#1559112 (kevinator) [22:21:28] Analytics-Backlog, Discovery: Present Discovery Metrics at Monthly Metrics meeting - https://phabricator.wikimedia.org/T109775#1559113 (kevinator) [22:36:15] Analytics-Backlog, Discovery: Present Google Referrer Traffic analysis at Metrics meeting - https://phabricator.wikimedia.org/T109773#1559172 (kevinator) Here are some other notes off the top of my head. Feel free to ask more questions * keep it short and high-level. You and @deskana have 10 minutes to... [22:36:51] Analytics-Backlog, Discovery: Present Discovery Metrics at Monthly Metrics meeting - https://phabricator.wikimedia.org/T109775#1559173 (kevinator) Here are some other notes off the top of my head. Feel free to ask more questions * keep it short and high-level. You and @IronHolds have 10 minutes total t... [23:05:55] a-team, see you tomorrow! :] [23:23:41] Analytics-Backlog, Research consulting, Research-and-Data: Scope effort needed to validate token-based Unique Client data - https://phabricator.wikimedia.org/T107239#1559282 (DarTar) I'm closing this, we can resurrect the thread whenever the discussion resumes. [23:40:27] Analytics-Backlog, Fundraising research, Research-and-Data: What's our projected ability to fundraise in the coming years - https://phabricator.wikimedia.org/T107606#1559376 (DarTar) @ggellerman we agreed that tasks in staged should never go back to backlog but move elsewhere or be marked as completed...