[08:12:01] hey hashar [08:12:32] hashar: I know last time we talked you were not very enthusiasthic about HD console screencasts :) [08:12:40] hashar: some people have made something just for you http://showterm.io/ [08:12:50] hashar: http://asciinema.org/ [08:13:12] hashar: enjoy [08:13:29] yeah ori is a huge fan of them :D [08:13:41] I can understand why :) [08:13:42] There must be something strange in your data [08:14:01] it's an ingenious idea [08:14:12] and quite useful in some situations [14:01:33] hey qchris [14:01:55] hi average [14:39:17] (PS4) Stefan.petrea: [DO NOT SUBMIT] kraken-hive stub [analytics/kraken] - https://gerrit.wikimedia.org/r/96738 (owner: QChris) [14:45:49] (PS5) QChris: [DO NOT SUBMIT] kraken-hive stub [analytics/kraken] - https://gerrit.wikimedia.org/r/96738 [18:48:49] DarTar: Did you have time to file that RT ticket related to scratch space in s[2-7] for me on Friday? [18:49:05] hey [18:49:07] I did [18:49:15] I thought I cc'ed you, didn't I? [18:49:19] double checking [18:49:54] yes you're copied and the ticket number is https://rt.wikimedia.org/Ticket/Display.html?id=6383 [18:54:30] ottomata1: Do you happen to know where https termination occurs? It's long before the traffic hits the varnishes. Right? [18:55:02] qchris, I believe so [18:55:06] nginx does it [18:55:09] ottomata: Ok thanks! [18:55:25] nginx might be running on some varnish host boxes though [18:55:26] not sure [18:55:34] i think mark was working on something like that once [18:55:51] Ok. Thanks. [19:03:51] Thanks DarTar: I'm in the process of getting access to RT (I think). [19:03:59] hey ottomata [19:04:18] let me know when you try out the proxy :D [19:04:27] halfak: good [19:06:14] (PS1) Milimetric: December meeting update [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/97542 [19:06:28] (CR) Milimetric: [C: 2 V: 2] December meeting update [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/97542 (owner: Milimetric) [20:01:48] halfak: I can look into the page creation SQL in about an hour [20:02:00] OK. Sounds good. Thanks! [20:02:05] I also saw the ModuleStorage results and had a quick chat with ori, that's very exciting [20:02:15] (particularly what we could do next ;) ) [20:05:37] New analyses or new uses of module storage? [20:05:54] is faster better? [20:06:21] Yes. [20:06:37] can you prove it for our editors/readers? [20:06:50] Hmm... How generally do I have to make my case? [20:08:54] I spoke to Ori/Dario briefly on this. Ori's focus is more site reliability at this point but I would be interested in understanding if preliminary experiments suggest that putting resources into site performance would help user/editor numbers [20:09:17] which isn't really an answer [20:09:36] Oh yeah. That sounds like an interesting idea. There are robust results across the tubes that suggest high performance sites gather more users. [20:09:52] But what kind of filter are we opening up? [20:10:40] filter? [20:11:49] sure -- the google theory is that users have a set amount of time to surf. faster pages == more pvs [20:11:50] I like to imagine filters when I think about the users we could be missing due to some issue.. , [20:12:26] Presumably, we're getting the same total unique views, but that users who find Wikipedia to be fast enough will make more use of it. [20:12:44] In other words, these users pass through the filter, while others bounce. [20:13:08] I see [20:13:18] but if you are looking for an answer, it might not matter [20:13:43] Which thing might not matter? [20:13:56] page load latency [20:14:36] Yes. There are other filters in place and (to continue to push the metaphor), there's a large amount of pressure pushing users back to Wikipedia. [20:14:46] (thanks google) [20:16:55] do we even know how our users browse the site? [20:27:59] milimetric: ottomata ping? [20:28:09] pong [20:28:17] YuviPanda: ^ [20:29:00] milimetric: I see gingle.wmflabs.org and metrics.wmflabs.org in the instance I setup manually [20:29:20] milimetric: can you move it to the newer system and just add them via https://wikitech.wikimedia.org/wiki/Special:NovaProxy [20:29:21] ? [20:29:34] new system? [20:29:34] and give me a headsup right before you do that so I can remove their current entires? [20:29:54] (there needs to be no downtime for the switchover) [20:29:54] i don't think i'm familiar [20:30:15] milimetric: oh, right. I'll just wait for ottomata then, he did respond on the thread in labs-l announcing it [20:30:34] oops, i'm not on labs-l [20:31:07] k, ottomata: I'm available if you need my help [20:31:13] milimetric: http://lists.wikimedia.org/pipermail/labs-l/ [20:31:16] YuviPanda: yoyo [20:31:42] hey ottomata [20:31:58] waaatsup? [20:31:58] I guess andrewbogott pinged you elsewhere to take care of the proxy stuff so I'll leave it to him :) [20:32:02] oh ok [20:32:06] ja I am in labs now [20:32:12] labs room [20:32:37] you're in Tampa? [20:32:38] :) [20:33:29] you should visit Busch Gardens down there, that place is so fun. [20:33:47] and also - let me know if you need anything from me re: wikimetrics / gingle [20:36:12] haha [20:36:34] psssh, busch gardens WILLIAMSBURG has got to be the best one [20:36:38] not that i've been to any others [20:36:40] but it must be! [20:41:40] Williamsburg <<<<<< Tampa [20:41:47] like... yes, that many "less than" signs [20:42:07] Tampa's a zoo + rides + water park all in one [20:42:12] Williamsburg has... wolves [20:42:37] wburg has dragons? [20:42:53] and i went to college in wburg so got discoutns and went a lot? [20:45:06] here you go ottomata: http://www.retailmenot.com/view/buschgardens.com [20:45:07] :) [20:46:45] ha [21:10:32] milimetric: hey! I want to up the post size now, think you'll have time to help me verify? [21:10:57] sure, let's do it [21:11:05] YuviPanda: ^ [21:11:09] sweet [21:11:11] * YuviPanda logs in [21:11:51] milimetric: current default seems to be 1M [21:11:57] is that what you see too? [21:12:30] one moment, [21:13:08] yeah, I think so YuviPanda (I'm just looking at files that worked and files that failed) [21:13:20] okk [21:13:27] milimetric: what do you want me to raise it to? [21:13:38] I was thinking 2M should be plenty for my purpose [21:13:41] is this per-verb? [21:13:49] no just general [21:13:50] because you can restrict it to POST as far as I'm concerned [21:13:51] k [21:14:09] I'll make it 16M just to be sure [21:14:14] moment, livepatching [21:14:25] heh, yeah, that'll be bigger than any cohort I have to upload by quite a bit [21:15:16] milimetric: try now? [21:15:28] milimetric: also can you ping metrics.wmflabs.org and tell me what IP you see? [21:16:21] 208.80.153.214 [21:16:59] okay as expected [21:17:02] milimetric: try uploading? [21:17:10] cool YuviPanda, it worked with a 1.3M file [21:17:12] so size is fine now [21:17:13] thanks! [21:17:20] milimetric: :D [21:17:36] milimetric: okay I'll make a patch now. This was monkey-patches, will disappear on next puppet run. [21:17:43] will ping you when it gets merged properly [21:17:46] cool, no problem [21:17:56] as far as I know, I'm the only crazy person to try uploading 120k users [21:17:59] hehe [21:19:04] milimetric: heh, default for production sites is set at 100m [21:19:11] I'll just set it at 128 [21:19:29] that's really not necessary for me [21:19:53] anything over 2M and I'll try to implement dynamic cohorts where I just run a query to get the members every time people run reports [21:19:54] milimetric: sure, but this is a general proxy for all of labs now... [21:19:58] yeah, i know [21:20:06] that'd be good to be consistent with prod [21:21:53] halfak: looks like archive table should be available to labs soon, just saw a patch get merged :) [21:22:03] hey that's awesome [21:22:12] :D [21:22:34] halfak: with the archive table, we can get a more intelligent revert rate metric from labsdb? [21:22:37] Woot! That means I might be able to drop my janky anti-temp table scripts for the other analytics slaves. [21:23:13] Theoretically yet. That can edit, bytes added, etc. [21:23:16] *yes [21:23:21] *and [21:23:22] yikes! [21:23:52] cool beans, i'll have to pick your brain when that happens and fix wikimetrics appropriately [21:24:23] Sounds great. I'd love to talk more about that and getting this stuff imported into map-reduce land. [21:24:43] I did some thinking after I talked to you about combining revision & archive for all wikis in one place. [21:25:26] oh? [21:25:30] halfak: the mysqlds on a bunch of replica servers(I think?) just got restarted, you might want to check it out to see if it is already there [21:25:41] Mostly pretending that already existed and imagining how my work would go. [21:26:06] Right now, I'm writing scripts to query multiple languages for some page deletion data. [21:26:15] It's pretty painful. [21:26:43] milimetric: okay, merged. You should be good to go :) [21:26:43] so you just write scripts like - getConnectionTo('enwiki') then runQuery and loop over all the languages? [21:26:46] If only there was one revision table and "wiki" and "archive" were columns. [21:26:49] sweet, thx YuviPanda [21:26:52] yw [21:26:58] milimetric: let me know if anything else pops up :) [21:27:09] halfak: yeah, that's how it would work in hive [21:27:26] wiki would be a partition so it would seem just like a column [21:27:28] Milimetric, I wish. Some of this stuff is near intractable if I don't do it on the DB server, so I have Makefiles from hell. [21:27:31] but what do you mean by archive being a column? [21:28:09] Well, the revisions still represent saved versions of pages whether they are in the archive table or not. [21:28:11] Makefiles from hell? This sounds bad [21:28:27] * milimetric drops everything and gets his MakefileFromHell samurai sword out [21:28:31] I imagine unioning those guys together and adding a boolean column. [21:29:09] LOL Well, you pump in an SQL query to get a TSV enough times and you write a Makefile to help remember which SQL goes with which dataset. [21:29:24] As you might imagine, I have some fun uses for dependencies now too. [21:29:56] e.g. build this table and this data files before running this script over the result and then reimport it to mysql. [21:29:58] oh I gotcha, so basically select if(a.any_column is not null, 1, 0) as archive, rev.* from revision rev left join archive a on ... [21:30:44] is this because different projects have different schemas? [21:31:25] I'd prefer "SELECT *, False as archived FROM rev UNION SELECT *, True as asrchived FROM archive" (with column name matching) [21:31:58] oh, the stuff in archive is no longer in revision, got it [21:32:02] Yeah. [21:32:11] so it's put there when it's deleted [21:32:18] Archive contains all revisions to deleted pages. [21:32:23] is that the only case or other cases too? [21:33:06] Hmm... I don't think there's another *common* way to move revisions from "revision" to "archive", but this is MediaWiki, so I don't doubt that there are. [21:33:12] k [21:33:34] so what do you mean above, re: building tables and data files before running scripts? [21:33:38] Individual rev deletes are captured in rev_delete and ar_delete fields. [21:33:53] Oh yeah, I was just rambling about my Makefiles. [21:34:05] do you have one publicly somewhere so I can take a look? [21:34:20] They encode a lot of my workflow and allow building my datasets by asking make. [21:34:24] Sure. let me get a link [21:35:14] https://bitbucket.org/halfak/echo/src/8e19edd40cef5aab25e63490bb1df9669f38629d/Makefile?at=default [21:35:24] also, I wonder if you'd like Drake: http://www.youtube.com/watch?v=BUgxmvpuKAs [21:36:15] I've been curious about replacements, but make does good for most of what I need it to do and most everyone else knows enough make to figure out what's going on. [21:37:38] drake is like make for data [21:37:54] it basically lets you define data flows in roughly the same way you seem to be doing [21:38:06] but a little nicer, you might like it [21:38:43] so I kind of get it, but all this would go under the "runSQL" part [21:39:07] Are you talking about the Makefile I linked? [21:39:17] yea [21:39:44] The last rule dumps some SQL output into a python script to generate new data. [21:39:46] See datasets/exp_1_user_stats.tsv [21:39:48] the "get connections to multiple projects" part is probably what hadoop would solve [21:40:00] But yeah, a lot of this is done in SQL and the last bits are done in R. [21:40:40] Yes. Allowing me to think of different Wikis as simply different constriants on one large revision table would be awesome. [21:41:05] I've got one right now where I need to write a rule per langauge. [21:41:14] Luckily, I'm only looking at 10 languages. [21:41:24] Scratch that -- two rules per language :\ [21:41:45] It adds up to ~ 80 lines of Makefile silliness. [21:42:20] that's what i'm interested in [21:42:25] so what's different per language? [21:42:27] the schema? [21:43:38] I guess not much -- at least as far as the revision and archive table are concerned. [21:44:08] The server # and db name is the most pain. [21:44:23] Also, no inter-wiki joins, but that's asking a lot. [21:44:35] It wouldn't be if all of the wikis lived in the same store. [21:45:01] Inter-wiki joins are useful for tracking global user accounts. [21:45:24] right [21:45:38] so this sql from this example file you made has enwiki hardcoded in it [21:45:46] and you just change this to like frwiki if you want to run it there? [21:45:49] Yes. [21:46:00] Usually I don't. that's part of the problem. [21:46:26] what do you do instead? [21:46:29] If I wanted to do an analysis that contained both wikis, I would create separate rules. [21:46:42] One for enwiki and then a duplicate for frwiki. [21:48:18] i guess i'm still having a hard time understanding what rules are [21:48:41] Oh! A rule is the basic *thing* in a Makefile. [21:48:50] A rule tells how to make a target [21:48:57] * halfak goes to check his terminology [21:48:57] oh gotcha, rule in make [21:49:14] nono, i'm a make noob so I'm sure it's reasonable to assume I knew that [21:50:05] but then wouldn't that make rule have to point at a "frwiki" version of the same sql you've got "enwiki" in? [21:51:03] Luckily the same SQL usually works. [21:56:50] oh, duh, because you'd just specify the host/server on the mysql command line and not in the file [21:57:04] ok, I can now confirm that this is worse than I thought :/ [21:57:25] we've gotta finish up this sprint and then we have thanksgiving [21:57:56] but i'm going to argue that we prioritize this even higher [21:58:16] and I'll fiddle with it a bit tomorrow too [22:03:18] milimetric: Sounds good. Please let me know if you think I can be helpful. :) I'm always down to spend some hours now so that I can spend less later. [22:03:46] cool, will do [22:12:17] ottomata: you got a moment to look at my hadoop cluster in labs and why it won't work? [22:12:18] The third party documentation is wrong [22:12:19] hive (default)> show tables; [22:12:19] FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient [22:12:19] FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask [22:13:36] sure! [22:32:42] any luck figuring out the source of that error ottomata? We can look at it tomorrow too [22:33:57] ah sorry [22:34:03] was chatting [22:34:04] let's ee [22:35:04] milimetric: hadoop-test1, right? [22:35:05] trying to log in [22:35:20] i was doing that on test-2 just onw [22:36:03] cool [22:36:27] oh! [22:36:31] yeah uh [22:36:34] we need to set up hive-serfver [22:37:02] in instance configuartion [22:37:04] in the lbas console [22:37:05] include [22:37:09] role::analytics::hive::server http://doc.wikimedia.org/puppet/classes/__site__/role/analytics/hive/server.html [22:37:10] somewhere [22:37:14] probably on your hadoop-test1 [22:37:26] k [22:37:57] not client? [22:39:33] oh i see, that's a parent to server [22:39:38] sure [22:39:38] yeah [22:40:16] Failed to parse template : Is a directory - /etc/puppet/templates/ at /etc/puppet/modules/cdh4/manifests/hadoop.pp:164 on node i-00000973.pmtpa.wmflabs [22:40:42] err: Could not retrieve catalog; skipping run [22:41:07] ? [22:41:12] that's on hadoop-test1? [22:42:42] yes [22:46:22] hmm, ok this is a new thing, I made a change last week to support rack awaremness in cdh4 module [22:46:30] looks like it isn't working hmm [22:46:40] interesting [22:47:05] fixing... [23:02:51] halfak: continuing here, sorry [23:03:04] No worries back in a couple minutes. [23:03:27] ok milimetric, thikn i fixed. running puppet now [23:03:32] anyways, if it's really a matter of weeks, that's fine, i'll simply not put it on the deployment calendar at all [23:07:09] OK back [23:07:53] So, I can put together the foundation of a solid writeup by Friday. That would mean we'd have a clear set of results and a description of our goals and methods. [23:08:18] I imagine that there will be feedback and potentially follow-up work to do based on that feedback. [23:11:06] ori-l: ^ [23:12:21] Unless we see something strange in the UA breakdown or by limiting our analysis to people who make it 10 page loads, I think we'll have a strong case by the end of the week to enable ModuleStorage across the board. [23:13:30] halfak: that'd be awesome [23:13:40] (sorry for the delay in replying, got roped into syncing a bugfix to prod) [23:13:47] No worries. :) [23:17:58] halfak: it also wouldn't be a big deal to back out if doubt creeped in by friday due to some previous unnoticed issue with the data [23:18:19] so by scheduling it i'm not committing you to a positive result [23:19:46] Sounds fine to me then. [23:19:55] ori-l: ^ [23:21:13] halfak: cool. thank you so, so much [23:22:35] gah, milimetric, something is wrong [23:22:36] not sure what [23:22:38] i have to run [23:24:20] will check it tomorrow [23:24:22] ping me remind me [23:24:23] byeyeyey