[03:52:30] Analytics-Backlog, Release-Engineering, operations, Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1473921 (greg) Sorted differently (branch order): ``` 1.26wmf1 3 1.26wmf4 8 1.26wmf5 1 1.26wmf7 5 1.26wmf8 1 1.26wmf10 2 1... [05:45:37] Analytics-Backlog, Release-Engineering, operations, Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1473986 (mmodell) I think it's clear that we need to abandon the practice of branching & changing URL prefixes each week. [05:47:00] Analytics-Backlog, Release-Engineering, operations, Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1473987 (mmodell) One naive solution would be to replace old branches with symlinks to a current branch. This would mostly s... [09:07:58] Analytics-Backlog, Wikimania-Hackathon-2015: Dockerize Hadoop Cluster, Druid, and Samza + Load Test - https://phabricator.wikimedia.org/T102980#1474262 (Aklapper) What is the status of this task, now that Wikimania 2015 is over? Did this hacking project take place and was successfully finished? If yes: Pl... [09:36:28] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [09:38:28] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [13:01:14] (PS1) Sbisson: Include all contributing users in 'unique-users' query [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/226524 (https://phabricator.wikimedia.org/T106564) [13:16:29] (CR) Matthias Mullie: [C: 2] Include all contributing users in 'unique-users' query [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/226524 (https://phabricator.wikimedia.org/T106564) (owner: Sbisson) [13:30:11] Analytics-Backlog: Clean up mobile-reportcard dashboards {frog} [13 pts] - https://phabricator.wikimedia.org/T104379#1474903 (Aklapper) Any news here? This has been [[ https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities | "Unbreak now" priority ]] for three weeks, has no assi... [13:45:00] Analytics, MediaWiki-Stakeholders-Group, Wikimania-Hackathon-2015: Provide summary of MediaWiki downloads - https://phabricator.wikimedia.org/T104010#1474934 (MarkAHershberger) Open>Resolved Aklapper writes: > What is the status of this task, now that Wikimania 2015 is over? This is done. Tha... [13:48:38] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [13:50:38] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [14:04:31] woot, we up to 26 worker nodes atm! [14:08:07] ottomata: WOOT indeed :) [14:08:32] ottomata: What was the issue with the new machines finally ? [14:08:45] some weird idrac setting i think [14:08:52] cmjohnson had to change something [14:09:22] k [14:09:34] awesome it's working :) [14:25:54] Analytics, Analytics-Kanban, Research-and-Data: Too few page views for June/July 2015 - https://phabricator.wikimedia.org/T106034#1475046 (Roxette5) i'm completely ignorant regarding computer stats -- however i'd like to point out something regarding the suggestino of fewer crawler bots. If you look at... [14:37:21] (CR) Mforns: "I think it works as it is, I responded to your comments :]" (2 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/224077 (https://phabricator.wikimedia.org/T103385) (owner: Mforns) [14:37:35] (PS2) Mforns: Move pid and history files to project folder [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/224077 (https://phabricator.wikimedia.org/T103385) [14:45:29] (CR) Milimetric: [C: 2 V: 2] "I see about the config change. It makes sense but it would've been fine to change generate.py as well, maybe that would've been easier to" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/224077 (https://phabricator.wikimedia.org/T103385) (owner: Mforns) [16:34:47] lunchtime, back laters [16:40:18] milimetric, thanks for the review [18:05:16] (CR) Milimetric: [C: 2 V: 2] Include all contributing users in 'unique-users' query [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/226524 (https://phabricator.wikimedia.org/T106564) (owner: Sbisson) [18:12:01] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1476047 (ksmith) @Milimetric: Dan has approved 2% across the board. Will that work for you? [18:35:36] milimetric, yt? [18:35:44] hey mforns [18:35:56] hey! are you working today? [18:36:39] mmm, I'm taking a sick day but trying to go through email and stuff [18:36:41] what's up [18:37:04] wanted to ask you [18:37:17] so, edit info in wiki database is public right? [18:37:37] what about upload info? is this also public domain? [18:38:02] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1476143 (Milimetric) @ksmith, yes, by my math that should be ok. Just in case though, could I be cc-ed on the patch that makes the change? That way I can undo it in case we have any... [18:38:14] upload... [18:38:25] like on commons? [18:38:52] i thought that was just an edit like anything else, but i'm not familiar at all [18:39:00] yes, the ultimate question woud be if we should consider EL events storing upload information as sensitive or not? [18:39:04] ok [18:40:36] will discuss it with adam, thanks anyway! [18:41:54] mforns: I think the spirit of the "sensitive" label revolves around whether or not the data is made public anyway via dumps / labs [18:42:03] aha [18:42:11] so edit information along with IP address for anons, etc, [18:42:41] ok, thx [18:42:47] but if it's feature usage information that wouldn't otherwise be available publicly, and can be combined with IP address or username or whatever, I'd be careful about it [18:43:00] like "user clicks edit button" [18:43:18] that's probably stuff we want to get rid of in general unless someone has a need to keep it around [18:43:38] a need that we would want to question to make sure it's not just "don't have the time to run analysis now" or sometihng [18:51:56] milimetric, makes sense [19:52:06] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1476381 (EBernhardson) I took a quick look over the kafka-php codebase, along with the patch that removed zk from the producers dependencies in kafka[1]. As far as i can tell, each partition of a topic within kafk... [19:57:27] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1476389 (Ottomata) > each partition of a topic within kafka has a single leader and all write requests for that partition go to that leader This is true, but it can change at any time. The reason for using Zookee... [19:59:42] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1476400 (Ottomata) I think this would get us what we need: https://github.com/nmred/kafka-php/blob/master/src/Kafka/Client.php#L109 [20:04:10] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1476413 (Ottomata) Hang on, that's not quite right... [20:06:37] ottomata: hi! [20:08:33] hiay [20:09:33] ottomata: so... do you think i should pause the balancedconsumer stuff until kafka upgrade? [20:12:53] madhuvishy: i think you shoudl keep working on it, since you can test [20:13:05] i'm working on kafka upgrade, and actually getting farther than I thought i would [20:13:14] ottomata: aah cool, okay [20:13:32] i have the offset commit stuff to figure out, so will do that [20:14:45] Talk tomorrow folks ! [20:54:38] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1384061 (ksmith) @milimetric. Will do, and thanks! As far as I'm concerned, this issue can be closed/resolved. [21:11:06] Analytics-Wikistats: Dump stats: further analyze drop in editor counts on English Wikipedia - https://phabricator.wikimedia.org/T48199#1476672 (Aklapper) This task has not seen updates for 16 months. Is this still high priority? [21:11:09] Analytics-Wikistats: Enable parallel processing of stub dump and full archive dump for same wiki. - https://phabricator.wikimedia.org/T62826#1476674 (Aklapper) This task has not seen updates for 16 months. Is this still high priority? [21:11:12] Analytics-Wikistats: stats for Wikidata exports - https://phabricator.wikimedia.org/T64874#1476676 (Aklapper) This task has not seen updates for 16 months. Is this still high priority? [21:11:13] Analytics-Wikistats: Provide monthly editor counts in SQL databaseLoad - https://phabricator.wikimedia.org/T65071#1476678 (Aklapper) This task has not seen updates for 16 months. Is this still high priority? [21:48:00] Analytics, MediaWiki-extensions-General-or-Unknown, Reading-Infrastructure-Team, Patch-For-Review: Track hook usage counts - https://phabricator.wikimedia.org/T106450#1476907 (Tgr) [21:48:41] Analytics, MediaWiki-extensions-General-or-Unknown, Reading-Infrastructure-Team, Patch-For-Review: Track hook usage counts - https://phabricator.wikimedia.org/T106450#1476914 (Tgr) a:Tgr [21:54:30] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1476954 (EBernhardson) I was also thinking about stripping everything but the production out of kafka-php, it would make something we can deploy without having random broken code sitting arround waiting to confuse... [22:02:50] anyone happen to know the name of a topic in Kafka? doesn't matter which just something that exists [22:12:04] Analytics-EventLogging: Kafka Client for Mediawiki - https://phabricator.wikimedia.org/T106256#1477011 (EBernhardson) Not sure how i missed it before, but in the examples/ directory there is a script called `MetaData.php` Running that in prod while pointed to analytics1012.eqiad.wmnet results in: ``` array(...