[00:01:07] halfak: I'm not much with SQL or the mediawiki schema, but I'm trying to figure out how many eligible voters there are in the current stewards election. In #wikimedia-stewards-elections, huh has done some relevant queries, e.g. http://quarry.wmflabs.org/query/1939 (timed out) and http://quarry.wmflabs.org/query/1942 and even http://quarry.wmflabs.org/query/1950 [00:01:45] and see other recent queries by 'nealmcb' or 'PiRSquared17' (huh) [00:03:41] so far I haven't seen any SQL to verify even a single userid having more than 50 edits in the last 6 months, though I guess I only tried a query with a big user. and of course I know little of databases.... [00:04:25] so if you start from scratch you might avoid the pitfalls we've faced so far.... [00:04:31] Either way, that query will work if you change revision to revision_userindex. [00:06:54] http://quarry.wmflabs.org/query/2006 [00:08:16] nealmcb, ^ This demonstrates a working edit counter. [00:08:22] halfak: aha - that's helpful! [00:08:38] Ooh. looks like I'm eligible. [00:08:40] :) [00:08:49] I had misunderstood your suggestion and was trying something a bit different... [00:09:20] It looks like we have 54k enwiki users with more than 600 edits. [00:10:21] That's a lot of IDs to join against revision, but it looks like I should only need the index -- not the whole table. [00:10:47] right. so is there hope for checking each of them vs the revision_userindex table, as one of the other recent suggestions was trying to do? [00:11:21] as a backup, there's always the option of sampling the data, but that gets tricky.... [00:11:47] nuria: do we know why jenkins is -1ing EL changes? [00:12:19] ori: no idea, if they pass flake8 they should be good to go [00:13:00] nealmcb, I'm working on it now. If I figure out an efficient strategy, I'll ping. [00:20:04] nuria: figured it out (with legoktm's help). are we good to merge https://gerrit.wikimedia.org/r/#/c/189764/ then? [00:21:05] ori: we can merge it sure, but we will not be deploying it anytime soon, makes sense? [00:21:11] nuria: why not? [00:21:26] what's the plan? [00:22:06] ori: because the code on master is broken and for some reason, as you saw yesterday, produces selects of 70.000 items w/o explanation plus timeouts out the consumer a bunch [00:22:21] ori: note that -since our revert from yesterday- it hasn't happen [00:22:57] ori: so until backfilling is done (wip) we need stable code ingesting events [00:23:20] ok. i still don't see any reason to believe that it has anything to do with the code changes. [00:23:43] let's at least merge it and get it running on beta. there won't be any automatic deployments to vanadium, and i can promise you not to deploy it myself until we coordinate. [00:23:52] nealmcb, well, I can tell you that there are about 53,650 enwiki users that qualify before filtering out bots. [00:23:57] Working on that now. [00:24:16] Woops. I take that back. I missed a filter. [00:24:18] :S [00:24:18] ori: after having looked at the code many times i have to agree but the evidence is on the revert [00:24:51] ori: ya we can merge it and test it on beta labs [00:24:58] nuria: regardless, i think you are right to want the backfilling to be complete before we change anything [00:25:49] ori:ok, I am about to start it on vanadium now, there is one bug for the handling of duplicates for which i will be submitting a patch in 2 mins [00:26:10] nod [00:44:19] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1033166 (Rfarrand) [00:48:31] nealmcb, OK. I've got 12,043 users. I'll dump a datafile shortly. [00:48:49] I cheated and used our stats DBs which allow me to store temp tables. [00:49:21] halfak: thanks again. One of the recent queries that I should have explicitly linked to is http://quarry.wmflabs.org/query/1942 which takes out some bots etc. but still doesn't check for the >50 recent edits [00:49:59] Yeah. Recent edits is hard. I hope you'll forgive me if I include global bots. [00:50:02] halfak: surprising. I assume that is en-only? [00:50:26] Yeah. Enwiki bots are easy to filter. [00:50:27] I'm taking off soon. neal@bcn.boulder.co.us .... [00:50:36] I'll respond on wiki :) [00:50:37] back tomorrow I hope [00:50:40] cool! [00:50:42] o/ [00:50:47] thanks again [00:50:53] hth [00:53:26] for other language wikis, sampling may be the complicated but expedient approach [02:10:15] operations, Analytics-Kanban, Analytics-Cluster: Increase and monitor Hadoop NameNode heapsize - https://phabricator.wikimedia.org/T89245#1033403 (kevinator) p:Triage>High [02:10:22] Analytics-Engineering, Analytics-Kanban: Backfilling EL events from 06/02 to 10/02 - https://phabricator.wikimedia.org/T89269#1033404 (kevinator) p:Triage>High [02:11:00] Analytics-Kanban: Script adds indices to the Edit schema on analytics-store {bear} - https://phabricator.wikimedia.org/T89256#1033405 (kevinator) p:Triage>Normal [02:11:13] Analytics-Kanban: New host for Visual Editor visualizations {bear} - https://phabricator.wikimedia.org/T89255#1033408 (kevinator) p:Triage>Normal [02:11:27] Analytics-Kanban: Controls help you navigate between the Visual Editor sunburst visualizer and timeseries visualizer {bear} - https://phabricator.wikimedia.org/T89254#1033410 (kevinator) p:Triage>Normal [02:12:09] Analytics-Kanban: Reliable scheduler collects Visual Editor deployments {bear} - https://phabricator.wikimedia.org/T89253#1033413 (kevinator) p:Triage>High [02:12:30] Analytics-Kanban: Reliable scheduler computes Visual Editor metrics {bear} - https://phabricator.wikimedia.org/T89251#1033415 (kevinator) p:Triage>Normal [02:12:41] Analytics-Kanban: Reliable scheduler collects Visual Editor deployments {bear} - https://phabricator.wikimedia.org/T89253#1031289 (kevinator) p:High>Normal [02:13:07] Analytics-Kanban, Analytics-EventLogging: Eventlogging should log timestamps for all error messages {oryx} - https://phabricator.wikimedia.org/T89162#1033419 (kevinator) p:Triage>Normal [02:15:14] Analytics-Kanban, Analytics-EventLogging: Add ops-reportcard dashboard with analysis that shows the http to https slowdown on russian wikipedia - https://phabricator.wikimedia.org/T87604#1033425 (kevinator) p:High>Normal [02:16:03] Analytics-Kanban, Analytics-EventLogging: Estimate maximum throughput of Schema:Search {oryx} - https://phabricator.wikimedia.org/T89019#1033427 (kevinator) p:High>Normal [02:16:55] Analytics-Kanban, Analytics-Cluster: Mobile Apps PM has monthly report from oozie about apps uniques - https://phabricator.wikimedia.org/T88308#1033430 (kevinator) p:High>Normal [02:17:09] Analytics-Kanban, Analytics-Cluster: Mobile PM sees reports on browsers (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1033432 (kevinator) p:High>Normal [03:02:48] Analytics: stats.grok.se about a week behind - https://phabricator.wikimedia.org/T89326#1033441 (Ijon) NEW [03:13:55] Analytics: stats.grok.se about a week behind - https://phabricator.wikimedia.org/T89326#1033461 (Ijon) Oh, see also https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#http:.2F.2Fstats.grok.se_partial_data [03:18:24] (PS2) Milimetric: [WIP] Add timeseries graph of key metrics [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/190113 [06:15:39] Analytics-Tech-community-metrics: "Contributors new and gone" in korma is stalled - https://phabricator.wikimedia.org/T88278#1033550 (Acs) >>! In T88278#1032325, @Qgil wrote: > Thanks! The new contributrs table is refreshed, but I'm still seeing very old results in "Who seems to be on the way out or gone?" F... [11:28:19] Analytics-Tech-community-metrics: "Contributors new and gone" in korma is stalled - https://phabricator.wikimedia.org/T88278#1033874 (Qgil) The most recent "last date" I see is this one: > Mushroom Unknown 2014-01-05 This is a year from now, not 6 months. [11:43:20] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1033890 (Qgil) [13:26:20] (PS3) Milimetric: [WIP] Add timeseries graph of key metrics [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/190113 [14:30:34] hi halfak, I'm in the batcave [14:30:57] Hey milimetric [14:31:01] cool. Heading there [14:38:23] halfak: thanks again! I fleshed out the talk page discussion, and boldly summarized it, with reference to the talk page (which seems slightly less improper than doing so on an Article page). But having some more permanent page describing the eligibility stats would be great - where might that go? [14:38:31] https://en.wikipedia.org/wiki/Wikipedia:Wikipedians [15:00:29] nealmcb, I'll start a research page on meta. We do a lot of analytics OR there and cite it from elsewhere. [15:01:07] ottomata: standup [15:01:18] nealmcb, do you think that we should focus on eligibility for this election or should we dig into "counting Wikipedians" in general? [15:01:38] ack! [15:07:49] At https://www.mediawiki.org/wiki/Special:Statistics it says an "active" user is one who performed "an action" in the last 30 days. Is just loading a page while logged in considered an "action"? [15:16:59] nealmcb, good Q. I wouldn't trust anything that Special:Statistics says. That's certainly not the "official" definition of "active editor" [15:17:34] But this should come with the caveat that I have no seen Special:Statistics' code. [15:23:16] Analytics-Engineering, Analytics-Kanban: Backfilling EL events from 02/06/15 to 02/10/15 - https://phabricator.wikimedia.org/T89269#1034295 (ggellerman) a:Nuria [15:23:31] nealmcb, https://meta.wikimedia.org/wiki/Research:Counting_Wikipedians [15:23:56] I'll add notes about the work I did last night here. [15:24:20] Analytics-Engineering, Analytics-Kanban: Backfilling EL events from 20150205 to 20150210 - https://phabricator.wikimedia.org/T89269#1031859 (ggellerman) [15:26:57] Analytics-Cluster: Getting Ananth started - https://phabricator.wikimedia.org/T77196#1034314 (kevinator) [15:26:59] Analytics-Kanban, Analytics-Cluster: Hive User calls UDF to pull real requestor IP out of X-Forwarded-For header - https://phabricator.wikimedia.org/T78812#1034312 (kevinator) Open>Resolved @ottomata merged the code yesterday. [15:28:04] Analytics-Kanban: Reliable scheduler computes Visual Editor metrics {bear} - https://phabricator.wikimedia.org/T89251#1034316 (ggellerman) a:Milimetric>mforns [15:29:10] Analytics-Kanban, Analytics-EventLogging: Sanity check changes to timestamp fields and remove autoincrement id from tables & deploy to Prod [8 pts] - https://phabricator.wikimedia.org/T88297#1034318 (kevinator) p:Normal>High [15:34:18] Analytics-Kanban, Analytics-Cluster: Build component for Oozie jobs to sends e-mails - https://phabricator.wikimedia.org/T88433#1034339 (kevinator) [15:39:02] Analytics-Tech-community-metrics: "Contributors new and gone" in korma is stalled - https://phabricator.wikimedia.org/T88278#1034347 (Acs) >>! In T88278#1033874, @Qgil wrote: > The most recent "last date" I see is this one: > >> Mushroom Unknown 2014-01-05 > > This is a year ago, not 6 months. Ups, I think... [15:45:04] ottomata: question, you have 2 mins? [15:47:03] halfak: Thanks! and note QueenOfFrance> nealmcb: an action definitively does not include viewing a page. [but the jury is out on whether logging in is an action] [15:47:22] nuria: gimme 5 mins [15:47:25] then yes [15:47:28] ottomata: k [15:49:56] Analytics-Kanban: Reliable scheduler computes Visual Editor metrics {bear} - https://phabricator.wikimedia.org/T89251#1034362 (kevinator) p:Normal>High [15:51:22] halfak: And if you can get any stats from CentralNotice on how many people see the "you can vote" banner, that would also be great. it seems that logging in is not an action. [15:53:49] nuria: wassup [15:54:24] ottomata: this query [15:54:27] https://www.irccloud.com/pastebin/07OZui36 [15:54:54] ottomata: is running into this error: [15:54:59] https://www.irccloud.com/pastebin/TNrzBqOJ [15:55:17] bits doesn't have any data right now [15:55:27] ottomata: ahhhhhhhhh [15:55:39] ottomata: ok, that explain it [15:55:47] *explains it [15:56:58] ottomata: and you were going to turn it on again right? [16:00:11] yes [16:00:20] just haven't yet because no one has needed it :) [16:00:29] you need it? [16:00:53] ottomata: it will be great to have it yes, to look at js enabled/disabled browsers [16:01:36] ok let's do it :) [16:02:00] Hi ottomata :) [16:02:21] Do we take some time to talk before research meeting ? [16:02:38] hey joal, ottomata [16:03:03] since you both here, will you be able to file the tickets to get Joseph access? [16:03:36] joal, yeahhHH! [16:03:40] sure tnegrin [16:03:42] ottomata: ok, thank you, let me know when there is data [16:03:56] tnegrin: I was reading on how to get this done :) [16:03:58] excellent — just ping me so I can approve quickly [16:04:13] cool [16:04:54] ottomata: hangout ? [16:05:17] joal, gimme just a few mins [16:05:39] np [16:05:48] i'm setting up a hadoop cluster in labs to practice an upgrade :) just want to get it happy before I forget about it [16:05:56] shall we say at half past ? [16:06:19] Sounds cool :) [16:06:26] Some new CDH features ? [16:06:27] ja or sooner [16:06:41] hive 0.13 and a newer spark, are what we are looking forward to I think [16:06:53] clearly ! [16:07:10] I go for some reading, will be back at half-past ! [16:07:12] ttl [16:07:23] ok cool [16:07:25] s/ttl/ttyl [16:07:54] Mhmm ... bits :-) [16:08:26] there it goes! [16:08:52] !log re-enabling bits varnishkafka instances [16:10:13] halfak: As I recall the Spark big data researchers at Berkeley say they can do some wikipedia full-text queries in under a second using Spark. Is anyone you know of using it actively for Wikipedia? [16:10:24] Analytics-Kanban, Analytics-Cluster: Estimate roughly of how many users might not have javascript capable/enable browsers - https://phabricator.wikimedia.org/T88560#1034407 (Nuria) On hold a bit as bits is not reporting any data to the cluster, otto to re-start that job again [16:10:49] Analytics-Visualization, Analytics-Kanban: Build high level timeseries view of key metrics [8pts] {bear} - https://phabricator.wikimedia.org/T88367#1034410 (kevinator) [16:11:29] nealmcb, I tend to use hadoop for that sort of thing. [16:11:59] halfak: spark is in-memory hadoop [16:12:02] I don't think you could do a full text scan of enwiki in under a second, but you could certainly build an index and work with it that fast. [16:12:07] spark is not just in-memory [16:12:29] what else does it do? [16:12:58] Right, but the "Resilient Distributed Datasets" (RDDs) can be cached very effectively in memory [16:13:14] * halfak remembers reading a performance comparison between hadoop and spark on disk. [16:13:19] Analytics-Kanban: New host for Visual Editor visualizations {bear} - https://phabricator.wikimedia.org/T89255#1034414 (Nuria) The puppet module that counts requests is logster, in labs it should be able to publish automatically to grapha/graphite [16:13:50] I hear that much of the industry (e.g. Mahout) is moving from mapreduce to spark [16:13:53] Analytics-Kanban: New host/lab environment for Visual Editor visualizations in labs that can report usage metrics {bear} - https://phabricator.wikimedia.org/T89255#1034418 (Nuria) [16:14:17] Analytics-Kanban: Reliable scheduler computes Visual Editor metrics [21 pts] {bear} - https://phabricator.wikimedia.org/T89251#1034419 (kevinator) [16:14:35] nealmcb, indeed. Spark is one of the things that we looked at. Right now, we're processing data that would be too large in-memory for our budget. [16:14:46] 100TB or so for a couple historic dumps of enwiki. [16:14:52] Analytics-Kanban, Analytics-EventLogging: Reliable scheduler computes Visual Editor metrics [21 pts] {bear} - https://phabricator.wikimedia.org/T89251#1031255 (kevinator) [16:15:02] and I love how RDDs provide both speed and fault-tolerance. so it seems like an architecture that can do just mapreduce stuff well, but also do much more complex analysis much faster [16:16:05] nealmcb: ja, we don't have much experience with spark yet, I hope we can change that soon [16:16:08] nealmcb, ottomata was digging into spark for our analytics cluster last I heard. [16:16:12] i think it will be useful for us in a lot of cases [16:16:18] just a hunch though [16:16:20] as i haven't really used it [16:16:25] halfak: interesting. of course it depends on the problem. but do you see spark being slower or harder to use than mapreduce? What is the downside? [16:16:32] i've been waiting to do this CDH upgrade, as the later version has slightly different setup instructions for spark [16:17:35] ottomata: cool. what version are you on now, and planning to move to? [16:18:08] nealmcb, space in RAM is a problem [16:18:13] I need a lot of it. [16:18:23] we are on cdh 5.0.2, moving to 5.3.1 [16:18:25] spark, not sure [16:19:08] nealmcb, biggest hurdle is that I haven't used it before and I have working hadoop code. [16:19:10] halfak: but you don't need to cache stuff - so what space are you talking of? is it less efficient than mapreduce if you just do mapreduce like stuff without caching? [16:19:57] nealmcb, why the push? There's millions of cool technologies that might improve things that we haven't tested yet. [16:20:23] That's really the real response. "We're trying new stuff while getting our work done. Spark is on the list." [16:22:00] halfak: well said. I'm definitely in the "spark is cool" camp, and have little experience. I'm simply gathering insights from the real world to compare with what the spark crowd is saying [16:22:44] nealmcb, you've got a spark cluster to play with? [16:23:44] I've got it on my laptop, and have taken a half-day course, and am trying to make it easier to set up with docker (e.g. from sequenceiq) but the hadoop config is tricky [16:24:16] but there is an active community in Boulder of folks doing spark stuff, so I'm looking for a good combo of tool + problem. [16:25:36] I have a friend with good connections, and I have some also. perhaps databricks would be willing to sponsor some cluster time for wikipedia analysis. Anyway, I'm just poking around :) [16:26:28] Analytics-Kanban: New host/lab environment for Visual Editor visualizations in labs that can report usage metrics [13 pts] {bear} - https://phabricator.wikimedia.org/T89255#1034443 (kevinator) [16:26:42] Analytics-Kanban: New host/lab environment for Visual Editor visualizations in labs that can report usage metrics [13 pts] {bear} - https://phabricator.wikimedia.org/T89255#1031328 (kevinator) - use limn-1 instance? if so add puppet code to pull logster module and new virt host to serve static files can make... [16:27:20] halfak: does alticale support spark? [16:27:24] altiscale [16:27:48] Yes [16:27:59] Looks like Spark 1.1 [16:27:59] http://documentation.altiscale.com/spark-1-1-with-altiscale [16:28:04] that might be a good place to play with it — if it’s useful we can install it on our cluster [16:29:36] Good point. It looks like they have decent docs. I'm suspecting that our pyspark code will be 2.6. I'd rather just hop over to Java/Scala land. :) [16:31:10] ok joal, i'm here [16:31:10] https://plus.google.com/hangouts/_/wikimedia.org/otto-joal [16:31:17] Where is a good up-to-date description of wikipedia analytics infrastructure - clusters, mapreduce code, common jobs, etc? [16:31:31] nealmcb, https://wikitech.wikimedia.org/wiki/Analytics/Cluster [16:31:37] and perhaps especially - current pain points that spark might help with? [16:31:52] :) [16:33:05] nealmcb: if you get a wikitech labs account, you are welcome to set up a cluster there [16:33:26] i don't have spark stuff automated at all yet, but you could do that manually [16:33:41] if you give me a week, i could probably get you a mini hadoop cluster (after this migration) with the newer cdh version [16:33:56] +1 to that. milimetric and I have been experimenting with setting up mini-clusters on labs. [16:34:26] you could do it yourself too, it shiould just be clicking some buttons in the labs interface...:) [16:34:52] Analytics-Kanban: Script adds indices to the Edit schema on analytics-store [5 pts] {bear} - https://phabricator.wikimedia.org/T89256#1034462 (kevinator) [16:35:27] ottomata, cluster buttons! [16:35:30] Analytics-Kanban: Script adds indices to the Edit schema on analytics-store [5 pts] {bear} - https://phabricator.wikimedia.org/T89256#1031351 (kevinator) * staging.milimetric_edit has the indices that seem to work * on data-warehouse repo on an specific folder. (analytics-store/log/Edit_12345_indices.sql) [16:35:31] <3 [16:35:58] * halfak imagines that capacity demands will increase a bit. [16:39:55] (PS2) QChris: Use bits when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186970 [16:39:57] (PS2) QChris: Refine bits webrequests [analytics/refinery] - https://gerrit.wikimedia.org/r/186969 [16:41:35] halfak: labs cluster buttons have always been there: http://cl.ly/image/3E2d1h21451m [16:42:18] https://wikitech.wikimedia.org/wiki/User:QChris/TestClusterSetup [16:42:24] ^ might help getting a cluster started. [16:42:29] (in labs) [16:43:02] Analytics-Kanban, Analytics-Cluster: WMF has technical documentation on UC by last visited date [5 pts] {bear} - https://phabricator.wikimedia.org/T88812#1034563 (kevinator) I created a wikitech page https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_visit_solution [16:49:56] thanks all! Time to do some reading on all this! :) [16:59:26] Analytics-Kanban, Analytics-Cluster: WMF has technical documentation on UC by last visited date [5 pts] {bear} - https://phabricator.wikimedia.org/T88812#1034581 (kevinator) We need to capture issues with counting across all projects and inability to use a cookie across all domains. We can aggregate the uniqu... [17:10:28] ottomata: at some point I had a "refinery-trainer" module that checked whether or not the urls from the sampled-1000 stream can still be parsed by the MediaFileUrlParser. [17:10:44] We said that "refinery-trainer" is a sucky name. [17:11:00] What name should I use for the new variant? [17:11:09] "refinery-fuse" ? [17:11:20] As it is kind of a fuse against UDF turning stale. [17:14:57] :) I'm gonna miss a lot [17:15:03] but you guys naming stuff, that I'm going to miss the most [17:15:13] -fuse!! brilliant [17:15:23] Oh. We can meet just to name random things :-) [17:20:32] ottomata: Wanna say ok to "refinery-trainer" -> "refinery-fuse"? ^ [17:21:31] +kevinator hey [17:32:03] (PS1) QChris: Switch refining's refinery-hive jar to versioned variant [analytics/refinery] - https://gerrit.wikimedia.org/r/190237 [17:32:05] (PS1) QChris: Drop unused, outdated refinery jars [analytics/refinery] - https://gerrit.wikimedia.org/r/190238 [17:39:35] qchris:yt? [17:42:10] nuria: yup. [17:42:34] Smells like pan-cakes are ready soon ... :-) [17:42:50] qchris: for dinner? decadent... [17:42:58] qchris: short question [17:43:01] Hahaha :-) [17:43:11] Sure. Shoot. [17:43:43] qchris: when you paralelized the backfilling , did you use paralelize or just a for loop and a reasonable number of files that are catted and igested [17:43:46] *ingested [17:44:16] I used for loops. [17:44:26] Like if I had to backfill for 128 files, then [17:44:48] The first process was a for loop backfilling file 1-32. [17:44:57] The second process was a for loop backfilling file 33-64. [17:45:07] You get the picture :-) [17:45:09] qchris: Ok, will do same then [17:45:25] qchris: that's it, thank you [17:45:31] Cool. [17:49:56] * qchris hails decadence. Pancake time! :-D [17:59:33] qchris_away: sorry, was talking to joseph, uhhhh fuse! Hm, nawwwww, don't like it! [17:59:41] but, let's talk about it :) [18:07:04] tnegrin: hewooo [18:47:13] ottomata: Mhmm. So how to call this thing then? [18:47:33] It should basically check that refinery code does not get stale. [18:48:22] refinery-watcher? [18:48:25] refinery-watchdog? [18:48:46] refinery-up-to-date-checker? [18:48:53] refinery-stale-guard? [18:50:49] ottomata: ^ [18:51:43] lets hangout about it, buut uhhh, after researcher meeting [18:51:48] will you still be around? [18:52:03] That is in 1 hour? [18:52:28] I'll be around then. [18:59:16] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1034814 (Rfarrand) As I read through feedback from the WMDS I will put some thoughts on Logistics here: 1) Ded... [19:06:56] milimetric: are we meeting now? [19:07:15] Ellery and I are sitting in the DevOps checkpoint and we're not sure if folks are joining [19:07:41] leila: yes, in this meeting: [19:07:45] https://plus.google.com/hangouts/_/wikimedia.org/analytics?authuser=0 [19:07:54] a bit of a snafu on the rooms - fixed now for future [19:09:27] thanks, milimetric [19:13:58] * DarTar waves at joal [19:25:02] Analytics-Kanban: Script adds indices to the Edit schema on analytics-store [5 pts] {bear} - https://phabricator.wikimedia.org/T89256#1034888 (mforns) a:Milimetric>mforns [19:41:24] Hi! can anyone there do a quick grep into some logs for me perhaps? :) Thanks in advance [19:44:48] Analytics-Wikistats: Provide total active editors for December 2014 - https://phabricator.wikimedia.org/T88403#1034921 (ezachte) So here is approximation based on available data for December (including top 15 Wikipedias) {F40445} [20:03:53] Hey DarTar :) [20:04:08] * joal|night waves back ! [20:07:40] (PS3) QChris: Use bits when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186970 [20:07:42] (PS2) QChris: Drop unused, outdated refinery jars [analytics/refinery] - https://gerrit.wikimedia.org/r/190238 [20:07:44] (PS2) QChris: Switch refining's refinery-hive jar to versioned variant [analytics/refinery] - https://gerrit.wikimedia.org/r/190237 [20:16:34] Analytics-Wikistats: Discrepancies in historical total active editor numbers - https://phabricator.wikimedia.org/T87738#1035025 (ezachte) I can investigate but not before quarterly report this week. [20:28:27] Analytics: Report Signups in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89276#1035049 (Tbayer) Just to make the point about historical data explicit, per earlier conversations: We also need the corresponding number for July-Sept 2014 and Oct-Dec 2013 (as implicit in the "xx% from Q1 / xx% y-o-y" in t... [20:37:35] Analytics-Wikistats: Discrepancies in historical total active editor numbers - https://phabricator.wikimedia.org/T87738#1035077 (Tbayer) OK - if you want us to mark these numbers as "preliminary estimate" or such, I can add something like a footnote in the report's scorecard. [20:40:00] marcel and I are having a pow-ow in the batcave over success / failure rates, feel free to jump in if interested [20:41:14] Analytics: Report Edits for 2014 Oct-Dec - https://phabricator.wikimedia.org/T89284#1035100 (Tbayer) To clarify for onlookers, this refers to http://reportcard.wmflabs.org/graphs/edits or the corresponding Wikistats page. I am happy to work from the data that's already available there, except that (like for T... [20:49:57] (PS1) Amire80: Remove localized "User" namespace prefixes [analytics/wikistats] - https://gerrit.wikimedia.org/r/190344 [20:53:34] bllrrrggg! fatal error! fundraising! (as suggested by awight) [20:53:44] baahaha [20:53:55] who can get me some yummmy delicious unfiltered log grep results, arrrrrg? [20:54:03] please? [20:54:23] Seriously. We need help getting log info, CentralNotice has been causing a lot of hits to a deprecated endpoint, and we want to debug what is causing it. [20:56:08] Analytics-Wikistats: stats.wikimedia.org - the "User" namespace for Nynorsk is wrong - https://phabricator.wikimedia.org/T89387#1035131 (Amire80) NEW [20:56:40] (PS2) Amire80: Remove localized "User" namespace prefixes [analytics/wikistats] - https://gerrit.wikimedia.org/r/190344 (https://phabricator.wikimedia.org/T89387) [20:59:15] (CR) Ottomata: [C: 2 V: 2] Refine bits webrequests [analytics/refinery] - https://gerrit.wikimedia.org/r/186969 (owner: QChris) [20:59:28] Analytics-Wikistats: stats.wikimedia.org - the "User" namespace for Nynorsk is wrong - https://phabricator.wikimedia.org/T89387#1035145 (Amire80) [21:01:16] (CR) Ottomata: [C: 2 V: 2] Use bits when producing legacy tsvs [analytics/refinery] - https://gerrit.wikimedia.org/r/186970 (owner: QChris) [21:02:00] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1035153 (Qgil) @rfarrand, you might want to add them in the editable description instead of piling them up in n... [21:05:10] Analytics: Report New editors per month in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89277#1035155 (ezachte) We don't have Wikimedia wide deduplicated counts for new editors, like we have for TAE. [21:05:42] Analytics: Report New articles for 2014 Oct-Dec - https://phabricator.wikimedia.org/T89283#1035156 (Tbayer) To clarify for onlookers, this refers to http://reportcard.wmflabs.org/graphs/articles or the corresponding Wikistats page. I would be happy to work from the data that's already available there, except... [21:10:33] Analytics: Report New articles for 2014 Oct-Dec - https://phabricator.wikimedia.org/T89283#1035167 (Dzahn) yes, http://wikistats.wmflabs.org/display.php?t=wp has a "grand total" section at the bottom. it fetches the numbers from the api of each wikipedia and then simply adds them up. note how there is a di... [21:12:05] (PS3) QChris: Drop unused, outdated refinery jars [analytics/refinery] - https://gerrit.wikimedia.org/r/190238 [21:12:07] (PS3) QChris: Switch refining's refinery-hive jar to versioned variant [analytics/refinery] - https://gerrit.wikimedia.org/r/190237 [21:16:23] (CR) Ottomata: [C: 2 V: 2] Switch refining's refinery-hive jar to versioned variant [analytics/refinery] - https://gerrit.wikimedia.org/r/190237 (owner: QChris) [21:16:35] (CR) Ottomata: [C: 2 V: 2] Drop unused, outdated refinery jars [analytics/refinery] - https://gerrit.wikimedia.org/r/190238 (owner: QChris) [21:16:58] Analytics-Tech-community-metrics: "Contributors new and gone" in korma is stalled - https://phabricator.wikimedia.org/T88278#1035173 (Qgil) Cache problems indeed. I have tried with another browser in incognito mode, and the results are correct now: > All1 Unknown 2014-08-12 Well resolved, then. Thank you! [21:17:02] Thanks! [21:19:13] Analytics: Report New articles for 2014 Oct-Dec - https://phabricator.wikimedia.org/T89283#1032289 (Tbayer) (Adding Dario in case he wants to weigh in on the reliability of the two different data sources. For now, I'm OK with using Daniel's data via the version history of the Meta page.) [21:23:00] AndyRussG, awight: It seems you're having a hard time getting heard :-/ [21:23:07] Maybe I can help out while I still got access? [21:23:35] What are you looking for? [21:23:57] qchris: thanks :) [21:24:14] We're trying to debug the source of any Special:BannerRandom impressions since... maybe Jan 1 [21:24:48] Accesses to that endpoint should have been phased out beginning Nov 24, so 30 days of long tail is reasonable [21:24:58] You're looking for the referer of Special:BannerRandom requests? [21:24:59] but as of today, we were still seeing dozens of requests per second [21:25:07] yeah, referrer and IP [21:25:08] Or just the number of Special:BannerRandom requests. [21:25:17] qchris: it's really weird, yeah that would aslo be a start [21:25:19] number per day would be great, as well [21:25:38] We definitely want to confirm the count is decaying. [21:25:58] qchris: especially it'd be good to know if they increased significantly after a deploy on Tuesday [21:26:25] k [21:26:35] qchris: thanks a ton! :) [21:29:04] omg we do have page_id in x_analytics! [21:29:06] and ns! [21:29:16] i wasn't sure if it was actually deployed, but it is! thanks ori! [21:34:10] Analytics-Engineering, Analytics-Cluster: Automate sqooping of page table into Hive - https://phabricator.wikimedia.org/T89394#1035249 (Ottomata) NEW [21:34:46] awight, AndyRussG I see they decreased sharply on 2014-11-2{6,7} [21:34:56] We're currently around 30 requests/day. [21:35:12] Sorry. 30K requests/day. [21:35:23] really... that's another part of the mystery. If you check hhvm.log, we caused many hundreds of fatal errors just today. [21:35:28] Analytics-Engineering, Analytics-Cluster: Refine page_id, page_name, and namespace using x_analytics fields and page tables - https://phabricator.wikimedia.org/T89396#1035264 (Ottomata) NEW [21:35:38] And nobody can say how that code is being executed! [21:35:43] qchris: hmmm.... 30k/day [21:35:47] Analytics-Engineering, Analytics-Cluster: Automate sqooping of page table into Hive - https://phabricator.wikimedia.org/T89394#1035272 (Ottomata) [21:35:48] oh SORRY [21:35:48] Analytics-Engineering, Analytics-Cluster: Refine page_id, page_name, and namespace using x_analytics fields and page tables - https://phabricator.wikimedia.org/T89396#1035264 (Ottomata) [21:35:50] right [21:36:17] are there obvious patterns in the IP or user agent? [21:36:32] * awight tries not to say "IE" [21:36:40] Analytics-Engineering, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. - https://phabricator.wikimedia.org/T89397#1035278 (Ottomata) NEW [21:36:48] I for now just looked at the numbers. As that is way quicker to query. [21:37:10] Gonna do the IP/ip query next. [21:37:11] I think it seemed there were lots of UAs, from what we saw from the sampled [21:37:39] 30k/day is 2 per minute [21:38:13] But in November it was 573M/day [21:38:22] So that's a huge decrease. [21:38:41] qchris: yes, that's exactly expected [21:39:05] well, we turned it off on Nov 26, I think. But I would have liked if it was steadily decaying since then. [21:39:15] qchris awight: at 2/min ^ sounds like less than Chad was getting [21:39:37] maybe it comes in bursts? [21:39:50] Analytics-Kanban, Analytics-Cluster: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1035306 (bd808) I think this idea needs wider discussion. Adding a durable tracking token to all visitors is a major shift in stance for the Foundation. In [[http://www.cookie-laws.com... [21:40:35] nuria: when you restarted wikimetrics did you use my kill -9 approach for all the queues and everything? [21:40:35] AndyRussG: if you want to look, it's fluorine:/a/mw-log/hhvm.log [21:40:47] milimetric: yes [21:44:34] Analytics-Engineering, Analytics-Cluster: Refine page_id, page_name, and namespace using x_analytics fields and page tables - https://phabricator.wikimedia.org/T89396#1035341 (Ottomata) [21:44:44] Analytics-Engineering, Analytics-Cluster: Automate sqooping of page table into Hive - https://phabricator.wikimedia.org/T89394#1035344 (Ottomata) [21:45:24] Analytics-Engineering, Analytics-Cluster: Make gecoded data and chosen client_ip available as fields in refined webrequest data - https://phabricator.wikimedia.org/T89401#1035346 (Ottomata) NEW [21:45:56] awight: they certainly do seem to come in bursts, and much more than 2/minute [21:50:34] qchris: what about any changes on Tuesday? Specifically our deploy was after 23:00 UTC on Feb 10th [21:51:18] In the sampled-1000 logs, I could not find any change. But that does not mean too much, as the volume of 30K/day is quite low. [21:51:30] In the unsamlped data, the queries are still running. [21:52:28] What's the expected return code. I see some 503s. I guess those are the ones crashing? [21:53:21] qchris: it's code that isn't even supposed to be run anymore, and was kept around just as a backup for a while, though a change that went ou recently broke it without really intending to [21:53:30] That definitely went out on Tuesday [21:54:27] qchris: I was thinking maaaaybe the sampled logs also pass thru some regex that tries to filter out these calls, so in fact we get more than that... is that possible? [21:54:53] No sampling on urls in the sampled-1000 stream. [21:55:04] huh? [21:55:08] Let me rephrase that. [21:55:31] The sampled-1000 throws away 999 of 1000 requests, but does not care about the urls in those requests. [21:55:54] qchris: OK, that's what I thought u meant :) [22:00:06] I guess you saw the "mw.centralNotice.insertBanner( false /* due to internal exception */ );" that gets sent as response to the 503s [22:00:48] Besides that, the kind of responses changed somewhere around/between 2015-02-10T23:25:49 and 2015-02-11T00:43:13 [22:00:56] This seems to match the deployment you mentioned. [22:01:16] Before response sizes have been typically 0 or 59. [22:01:34] Now they are 0 or ~2.7K [22:01:49] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1035409 (Rfarrand) Currently they are just thoughts and not agreed upon improvements! I will edit the descripti... [22:03:19] qchris: the responses changing to 503s following our deploy is fine [22:03:24] In preliminary results, IE does not stick out. [22:03:40] I also see Firefox and Chrome Mobile and some such [22:03:57] What about the number of requests? Any change in the number of requests at that time too? [22:04:16] Not in the sampled-1000 stream. The query for unsampled data is still running. [22:04:58] qchris: OK! I guess that means that there probably wasn't a significant jump at that time! [22:05:14] hey ottomata, I need some packages installed on stat2/3 in order to compile some C libraries in python. Do I file an RT ticket or so something with phab? [22:05:26] RT is dead, long live Phab! [22:05:34] Was around 30K/day before and is about 30K/day afterwards. [22:05:43] ottomata, how do I phab? [22:05:51] So I do not think that the deployment changed volume. [22:06:51] awight: ^ [22:08:21] qchris: OK cool... Independently of whether this is or isn't the cause of all the hhmv errors, I wonder where it's all coming from! This is JS that was last live sometime in Novemner [22:08:24] November [22:09:03] create ticket, operations project, CC me? [22:09:31] AndyRussG: Seems they are plan views of wiki pages. Nothing fancy. So for example: [22:09:39] http://en.wikipedia.org/wiki/Radio_Caroline [22:09:44] ^ is one of the referers [22:09:52] Others look alike. [22:09:54] Nothing special. [22:10:29] (Abandoned) Ottomata: Add IpUtil, Geocode, and GeocodeCountryUDF classes [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 (owner: Ottomata) [22:10:47] qchris: have u seen the hhmv logs? [22:10:55] Mhmmm. The fraction of mobile looks strangely high. [22:11:16] AndyRussG: The one you pasted before? Yup. I took a peek. [22:11:25] fluorine:/a/mw-log/hhvm.log [22:11:30] But they don't tell me much. [22:11:38] I have no clue about CentralNotice. [22:12:16] qchris: Just trailing that, it seems a lot more than the 30K/day you're talking about [22:12:37] AndyRussG: Full ACK. [22:12:50] qchris: ? [22:13:13] qchris: we don't know of any reason for that code to be run other than a request to Special:BannerRandom [22:13:21] I fully agree that the hhvm logs shows more exceptions than we see requests from the varnish logs. [22:15:03] I wonder if somehow a single request could trigger several log errors? Maybe varnish retries a bunch of times if it gets a 503? [22:15:29] Not sure about the retrying. Ops would know better. [22:15:56] nuria: what's the timeline for backfilling? [22:18:37] ori: will take today's and tomorrows at least [22:22:33] ori: likely a bit longer, [22:22:51] ori: You can see volumes of queries here: https://tendril.wikimedia.org/host/view/db1020.eqiad.wmnet/3306 [22:23:26] AndyRussG: Any chance to get backtraces? [22:24:35] qchris: we got one: https://phabricator.wikimedia.org/P289 [22:24:43] Granted, it's only one [22:24:53] 1 > 0 :-D [22:25:14] Indeed it looks like it's coming from the old JS call, it has all the params on it that it would in that case [22:25:26] But there's just too many of 'em [22:27:16] Mhmm.. Maybe I am reading that backtrace wrong, but doesn't [22:27:30] Exception from line 15 of /srv/mediawiki/php-1.25wmf16/extensions/CentralNotice/special/SpecialBannerRandom.php:SpecialBannerRandom::execute [22:27:39] Mean that it happened on line 15 of that file? [22:27:59] (Because hhvm.log is reporting line 27) [22:28:04] AndyRussG: ^ [22:28:51] nod [22:31:01] qchris: that was just a "hotfix" intentional error that Chad put in to force a backtrace, to see how on earth that code was getting run [22:31:54] SpecialBannerRandom->execute should never be run unless there's a Web request to Special:BannerRandom [22:32:58] I guess this just means that I do not fully grok the relation between the backtrace and the hhvm.log message. [22:33:30] Anyways .... it's not about the execute method, is it? [22:33:46] It about chooseBanner. Right? [22:34:10] (line 27) [22:44:00] AndyRussG: I would not worry about the volumes not matching between the varnish logs and the hhvm.log. I grepped around a bit in puppet and it seems there are retries in for 503s on text. [22:44:29] So I guess one request can cause multiple entries in hhvm.log. [22:44:45] Also the entries in hhvm.log are /really/ bursty. [22:44:55] So that would align. [22:45:01] qchris: OK that's great to hear! awight: ^ [22:45:17] oooof [22:48:11] Analytics-Visualization, Analytics-Kanban: Build high level timeseries view of key metrics [8pts] {bear} - https://phabricator.wikimedia.org/T88367#1035495 (Milimetric) [22:52:22] qchris: thanks so much for your help with this! [22:52:34] Not sure I am much help :-/ [22:53:38] qchris: on the contrary! [22:53:59] Let's first dig in further :-) [22:56:47] qchris: sure! If you can find out for sure the retry on 503 thing, I think that'd be an important piece of the puzzle [22:57:23] I am onto that right now :-) [22:57:28] Thanks! [22:58:44] qchris: please feel free to drop anything in the Phabricator task https://phabricator.wikimedia.org/T89258 [23:01:54] qchris: the other missing piece is how it is that even 30k requests/day keep coming in for that page, but I guess the theory that it's old cached JS that never dies is plausible enough [23:05:37] AndyRussG: Can you help me sanitycheck something? [23:05:53] On florine, in /home/qchris/BannerRandom [23:06:03] You find timestamps and http_status codes. [23:06:10] sorted by timestamp. [23:06:20] Each line corresponds to a request we receive. [23:07:12] Argh. Sorry. I can explain away my theory myself. [23:07:17] Sorry for the noise. [23:07:48] Anyways ... What this file allows you to see is that each logged 503 causes 4 lines in hhvm.log. [23:08:09] There is one retry. [23:08:15] So two tries total. [23:08:27] And each try causes 1x Notice, and 1x Fatal. [23:08:33] So 4 lines total. [23:09:07] But sometimes .... There are maaaaaaany more lines in the log than there are log lines in varnish. [23:10:42] qchris: Hmmm [23:11:01] Really interesting! So... Something else is also going on? [23:11:56] So maybe to fix this we also just should program Varnish to emit a 503, without bothering PHP or anyone else, no? [23:12:27] * qchris has no clues about varnish. So he cannot suggest things there. [23:14:03] varnish is not easy to grep. It's like reading assembly [23:14:12] grep or grok or understand, I mean [23:16:09] AndyRussG: Now I finally think I've got an example that exposes this. Could you help me double-check? [23:16:20] On fluorine, /home/qchris/2015-02-12T07_04-f3-7.tsv is again [23:16:37] a file with timestamp + http_status code from the text+mobile caches. [23:17:54] From looking at that file, it seems to not to expect too many entries in the hhvm.log around 2015-02-12T07:13:06 [23:18:01] qchris: K just lemme try to scp that to my laptop I guess [23:18:55] But when looking at the hhvm logs, there are many entries for 07:13:05+07:13:06 [23:19:47] (The fact that the second is off by one is not much of a concern. This can happen as "taking the timestamp" is not synchronized across hhvm and our logs) [23:21:20] So that leads me to think that either: [23:21:40] i) Some requests die at strange times/delay, or [23:22:22] ii) something else in doing requests to the app-servers (or whatever is serving them) directly (bypassing varnish. So it's not outside users), or [23:22:39] iii) I am completely misunderstanding things. [23:23:36] AndyRussG: Makes sense? [23:23:57] Oh. There is also: [23:24:12] qchris: this file, /home/qchris/2015-02-12T07_04-f3-7.tsv, is sampled logs, grepped for BannerRandom, stripped down to timestamp and response code? [23:24:12] iv) We're not getting all logs into Hive. [23:24:31] qchris: Oh! This is from Hive, then? [23:24:33] the 2015-02-12T07_04-f3-7.tsv is unsampled. [23:24:37] Yes. [23:26:40] I double-checked sequence stats for that hour. And sequence numbers are continuous without holes. That makes option iv) pretty unlikely. [23:27:00] qchris: K so this shoots holes in our varnish retry theory... hmmm... [23:27:37] Well ... double check with Ops. They know better. [23:27:49] The 4 lines per 503 are expected from my point of view. [23:28:02] We see one retry in the hhvm logs, and also the cache setup has [23:28:13] 'retry503' => 1, [23:28:18] for the text caches. [23:29:04] qchris: I think it would be fantastically helpful it you could post the relevant snippets in the Phabricator task [23:29:24] Not sure that shoots holes in the retry theory. The retry theory explains part of the relation mismatch between varnish logs and hhvm logs. [23:30:11] That section of the tsv file, that section of the hhmv log, and a link to that part of puppet varnish code on gitblit [23:30:14] Sure, I'll add that to phab. [23:30:20] qchris: fantastic, thanks so much! [23:30:40] Like you said, ops are the ones who have the deep knowledge of all this [23:31:01] Wasn't there at some point a script that rendered banners? [23:31:29] Did that use the app-servers directly, or does that go through the varnishes? [23:31:33] Just on the surface, it looks to me like it's a lot more hhmv errors than Hive log entries x 4 [23:32:13] Yup. There are more hhvm errors than 4xHive log entries. [23:32:34] qchris: everything goes through Varnish, I think. There is server code + client code involved in banner display [23:32:36] At least that's how I intepret the timestamp above. [23:32:45] Yup! :) [23:33:13] In november we changed a bit the way that works, and that's when we shifted from Special:BannerRandom to Special:BannerLoader as the server call required to retrieve the banner [23:33:35] Special:BannerRandom is officially no longer used anywhere on WMF production since then [23:34:12] That's why we weren't so worried about checking that it wouldn't throw errors :) [23:34:39] :-) [23:35:21] awight has prepared a patch to finish ripping the rest of it out, but I just want to be sure doing so won't cause us to silently be throwing away banner impressions that this error may be showing up [23:35:41] So I'll add the above information to the phab ticket, but I am not sure if I can do much more. I guess you need help from someone that has access to the app-servers, or a deeper understanding of the wmf setup, or Ops. [23:39:44] (PS4) Milimetric: [WIP] Add timeseries graph of key metrics [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/190113 [23:43:29] qchris: or all of the above :) [23:43:37] :-D [23:43:57] thanks once again :D [23:45:59] thanks ggellerman______