[15:20:38] o/ [15:21:36] morning halfak [15:21:47] Hey schana! [15:27:47] o/ sabya [15:28:00] o/ halfak [15:29:48] \o_ [15:29:58] :D [15:31:33] how was Dosa with Subbu ? :-D halfak [15:32:01] Was great! :) My first time having dosa. [15:32:21] sabya, do you know subbu? [15:33:10] nope. I read you were writing to DarTar. [15:33:29] Oh! Gotcha :) [15:34:39] * subbu looks up from his non-dosa breakfast and goes back to it ... [15:35:04] halfak, dosa is commonly eaten for breakfast in south india [15:36:08] halfak : I am getting familiar with Wiki editing tools since last couple of days. It will help me put things in perspective. Read the paper you sent me [15:36:34] Great! That's a good paper. It's one of my favorites. [15:37:15] I just happened to be inspired to pick up XGBoost this morning. [15:37:31] http://xgboost.readthedocs.org/en/latest/model.html [15:38:07] I'm experimenting using it to predict damaging edits to see how it compares to our RandomForest and GradientBoosting models. [15:38:21] the paper is pretty exhaustive. I wanted to try Huggle on enwiki as well. but not having rollback permission. So tried it on some other wiki. [15:43:21] sabya, yeah. Fun story -- I've never used Huggle on enwiki either :) [15:43:39] * halfak should get his rollbacker rights [15:43:53] halfak, how long are you online here? I'll be back in 45 mins. [15:43:58] sabya, any luck experimenting with the labs cluster? [15:44:01] I'll be here :) [15:44:23] great. I'll try to setup the cluster today. [15:44:29] sabya started collaborating with me when I was traveling. Little does he know that I rarely leave this channel otherwise ;) [15:46:22] * halfak maxes his CPU on some experiments with XGBoosting models. [15:46:38] Ooooh. And it looks like the preliminary tests are pretty good. [15:47:19] Even simple trees are getting nice scores on enwiki's "damaging" predictions. [15:50:11] BTW Tilman pointed me to http://tuprints.ulb.tu-darmstadt.de/5225/ earlier today; it might be interesting to others here. [15:50:38] "The Writing Process in Online Mass Collaboration: NLP-Supported Approaches to Analyzing Collaborative Revision and User Interaction" [15:51:10] "econd, we investigate activity-based roles of users in Wikipedia and how they relate to the collaborative writing process. We automatically classify all revisions in a representative sample of Wikipedia articles and cluster users in this sample into seven intuitive roles." [15:51:19] halfak: ^ [15:54:20] guillom, yeah. Been working from Daxenberger's notes. Some big methodological holes there though. [15:54:46] E.g. training and testing the edit type classifier on edits to a biased sample of articles. [15:55:00] I haven't looked at it yet; Tilman sent it to me because he knew I was working on contributor roles. [15:55:13] But I think the overall methodological vision is solid and will work great with iteration. [15:56:27] * guillom looks at his quarterly goals for the next quarter, tries not to fall victim to the [[w:planning fallacy]] this time. [16:10:35] * halfak quotes from [[:en:planning fallacy]] [16:10:37] "The planning fallacy, first proposed by Daniel Kahneman and Amos Tversky in 1979, is a phenomenon in which predictions about how much time will be needed to complete a future task display an optimism bias (underestimate the time needed)" [16:11:54] always [16:12:25] o/ apergos [16:12:32] heya [16:12:42] Good morning! (or other timezone apt. greeting) [16:12:43] so I have a law for (a part of) that [16:12:55] "Nothing ever takes 5 minutes." [16:13:06] Except for editing a Wikipedia article. [16:13:12] And it's often 7 minutes. [16:13:17] But 5 is pretty darn common ;) [16:13:34] well: "I just want to push this patch out. It will only take a minute" [16:13:38] = famouslast words [16:13:45] You must already be in an "activity phase" though [16:14:04] "If you wait till the last minute, it only takes a minute." [16:14:18] Nonesense I'm basing my assertions on: http://arxiv.org/abs/1411.2878 [16:14:22] "if you wait til the last minute, you'll be doing it again tomorrow" [16:15:24] you can paraphrase this as" fast or done, choose one" :-P [16:17:01] (I"m looking at the paper now) [16:21:00] apergos, that's one work that I'm hoping to re-write for a few different audiences. It seems like we stumbled onto a strange regularity in human behavior. I've been talking to neuroscientists about physiological cycles. I expect that there some neuromechanical explanation for these attention patterns. [16:21:36] E.g. the symbols that form an action get ~7 minutes of memory to work with and we'll optimize our /activities/ to be constructed of 1-7 minute actions. [16:22:00] Regretfully, most nueroscientists I have run into are looking at cycles on the nano-second scale. [16:22:34] When I say I'm looking at patterns that are *minutes* long, I usually get laughed at :D [16:22:44] hey halfak, I'll be forcing students to use Quarry this afternoon, approx. 4 Eastern; will also show IRC channel but might be the only one actually online to mediate questions [16:23:23] o/ wiggins [16:23:30] Cool. I'll be around. [16:23:34] yuvipanda, ^ [16:24:10] awesome, thanks--hopefully it'll go down easy since at least 3 students know SQL, but schema opacity could trip us all up [16:24:59] wiggins, I think that is likely. In a way, discovering the MediaWiki database (and probably any sufficiently complicated other DB) is a lot like discovering the norms of some community. [16:25:20] so what about people who are spending time drafting the article, reading sources...? that never gets counted but it is a part of "time editors spend contributing" i.e. "wiki-work" [16:25:26] Some of it is documented, but you don't really know how to operate within them until you either (1) observe someone else or (2) practice them yourself. [16:25:49] well, that'll make my key point for the day: metadata matters. A lot. [16:25:53] apergos, that's right. I guess I suspect it to be somewhat uncommon, but it is excluded from any labor hours measurement. [16:26:31] wiggins, would love to see some notes on what metadata would have been most useful and how your students would *like* to be able to browse it. [16:26:41] I wonder what a survey would reveal about that [16:26:42] E.g. search queries and click on table names to get schema [16:28:10] halfak, for sure, if I can record/summarize, will do. Tasks are: 1) find "interesting" recent query & attempt to interpret; 2) find series of queries by same user & attempt to guess what they're up to; 3) pick a query, try to modify & run it [16:29:25] wiggins, sounds like that will be fun and super informative :) [16:29:28] added to activity slide: feedback for Wikimedia, what would have made this easier? [16:29:34] * halfak wishes he had prof. wiggins as instructor [16:29:42] Awesome! [16:31:06] it might be part of their portfolio assignment for the day, then they can elaborate a little more [16:32:01] really curious to see how far they can get with it, given the vastly divergent backgrounds in the class [16:41:22] gonna try setting up my lab cluster [16:42:56] sabya, great. Note that there are puppet roles for "ores". These set up the redis node, load balancer, web node, and worker nodes. [16:43:13] These may come in handy if you want to replicate prod, but for testing you might not want to use them. [16:43:55] ok, any getting started doc? [16:44:18] sabya, regretfully not. [16:44:30] But this seems like a good time to make one. [16:44:38] Maybe we can do that as I answer questions. [16:45:15] sure. I'll create one. [16:46:32] sabya, maybe you could make it an etherpad or wiki page so that I can copy-paste bits in there for you. [16:47:35] ok. let me try etherpad [17:02:08] hey wiggins! [17:02:10] nice to see you here :D [17:06:17] halfak: I should add 'search' to Quarry [17:06:38] Yeah. Would be really cool to filter queries by table used as well. [17:06:55] we can easily do that, yeah [17:06:56] yuvipanda, is it possible to get an AST for SQL queries easily? [17:07:05] there are enough parsers for it yeah [17:07:19] Even a messy one would probably be OK. [17:07:35] * yuvipanda nods [17:07:44] and it'll also give me experience with elasticsearch [17:07:49] * halfak sets up an experimental ORES node for sabya to play with [17:10:34] halfak: do you have some time to check the PRs for i18n? [17:10:55] Yeah. Will look as soon as I have sabya started. [17:11:04] Amir1, check out https://etherpad.wikimedia.org/p/setting_up_ORES_lab_cluster in the meantime. [17:11:35] yuvipanda: hey, please check this patch: https://gerrit.wikimedia.org/r/274912 [17:11:41] Amir1: halfak btw, ensure_packages vs require_packages is a red herring [17:11:42] awesome [17:11:46] thanks halfak [17:12:05] yuvipanda, bummer. Any thoughts on the underlying issue? [17:12:09] sabya, http://ores-experimental.wmflabs.org/ [17:12:16] That's the public-facing address. [17:12:30] yes [17:12:35] ok. thanks halfak [17:12:40] sabya, you should be able to SSH to ores-experimental-01.revscoring.eqiad.wmflabs [17:12:47] If you have your ssh set up right for labs. [17:13:06] Turns out you can drop the "revscoring" and just SSH to ores-experimental-01.eqiad.wmflabs [17:13:07] had uploaded the keys. let me check. [17:13:13] :( [17:13:19] sabya, will need to set up proxy through the bastion too. [17:13:20] hi schana. [17:13:22] * halfak gets link. [17:13:28] hey leila [17:14:02] sabya, see https://wikitech.wikimedia.org/wiki/SSH_access [17:14:04] yuvipanda: okay. let me check the whole thing again [17:14:09] probably tomorrow [17:14:12] I'm reviewing the tasks, question: in T113384 can you add an example of a working url as a comment in the task? [17:14:16] schana, ^ [17:14:21] but your insight would be valuable [17:14:24] sure, leila [17:14:34] thanks, schana. [17:14:43] o/ leila :) [17:14:46] * halfak needs to run away for lunch [17:14:48] back in a bit [17:14:50] hey Amir1. :-) [17:15:39] sup leila? [17:16:03] not much, Amir1. How is it going on your end? [17:16:27] not bad, waiting for ops to move ores to prod [17:16:45] then we deploy the extension in fa.wp and then wikidata [17:17:08] yeah, that's what I'm hearing. :-) [17:18:15] I hope this happens soon [17:18:36] I saw article recommander rocks [17:19:22] schana is working on it these days, Amir1. We're aiming for some campaigns and tests to start soonish. [17:19:48] awesome [17:19:50] :) [17:20:22] If I can help, I would be thrilled [17:21:49] I'll let you know, Amir1. You already have a lot on your plate, but if you'd like to help, I'm sure we will bug you soon. D: [17:21:50] :D [17:22:18] Noruz is coming so I would lots of free time [17:22:28] and I'm looking into possible way to fill it [17:23:05] *I would've [17:32:57] SSH worked! [17:33:02] sabya is thrilled [17:39:33] Amir1|afk: halAFK https://gerrit.wikimedia.org/r/#/c/275560/ should probably fix one [18:07:38] sabya, \o/ [18:07:43] yuvipanda, looking [18:08:25] any other steps? halfak [18:09:19] sabya, yeah. See the fabfile in ores-wikimedia-config. [18:09:20] * halfak gets [18:09:31] https://github.com/wiki-ai/ores-wikimedia-config/blob/master/fabfile.py [18:09:47] fabric essentially lets us execute commands remotely on a set of machines. [18:09:57] We use it to set up a new node and to deploy changes to nodes. [18:10:16] See "initialize_staging_server": https://github.com/wiki-ai/ores-wikimedia-config/blob/master/fabfile.py#L65 [18:11:22] I'm not sure if there is a good way to purposefully run the fabfile against this machine, but there should be. [18:11:36] Alternatively, you could just review the fabfile and run the commands manually. [18:12:20] ok. [18:16:24] halfak: just a fyi, I'm going to merge that in about 15minutes and see how that goes [18:16:45] Sounds good. How do you want to test it? [18:23:10] halfak: I don't :D [18:23:13] (j/k) [18:23:24] I'm just going to merge it and make sure nothing breaks, and see how that goes [18:23:25] * halfak rages for 0.01 seconds. [18:23:52] We could stand up 3rd web node for a short period of time and see what happens. [18:24:28] nah [18:24:36] this affects all of uwsgi [18:24:39] including prod services [18:24:45] so we'll find out wherever [18:24:49] so in cases like this [18:24:51] I'll disable puppet [18:24:54] in all these places [18:24:56] and enable it in one [18:24:56] heh. Let the others test it. [18:30:47] * sabya is sleepy. going to bed. will get back on setting up the labs instance tomorrow. [18:31:06] o/ sabya [18:31:12] Thanks for hacking today :) [18:31:42] thanks for helping me out. :-) [18:31:52] o/ schana [18:32:01] hey halfak [18:32:15] Interested in digging into ORES stuff today? [18:33:11] what did you have in mind? [18:34:52] halfak: can you make the requirements.txt in ores-wikimedia-config to be the output of a pip freeze? [18:35:25] yuvipanda, yes [18:35:31] halfak: ok, thanks :D [18:35:35] schana, still working on that packaging stuff. [18:35:39] halfak: I'll create the repo and stuff [18:35:42] We only did the first bit before. [18:35:57] schana, would love to have you get an overview of the system so that you can help get it into prod. [18:36:19] halfak: I could probably do an hour today, but could do a lot more tomorrow [18:36:33] kk. So, there were the two tasks I directed you to last time. [18:36:43] Seems like yuvipanda's ask is one we should add. [18:36:44] yeah, I'm looking at them now [18:36:45] * halfak gets links. [18:39:39] halfak: I'm heading out to lunch, but should be back in about an hour to 1.5 hours from now [18:39:49] if you wanted to work on it then [18:40:22] schana, OK if I start assigning a few things to you? [18:40:30] We can always re-shuffle later [18:40:54] halfak: sure [19:11:18] schana_lunch, when you get back, check out https://phabricator.wikimedia.org/T129109 [19:11:31] I spent some time adding links and clarifying what should be done. [19:11:44] Time for me to go to aspen. [19:21:20] schana_lunch: for whenever you're back: do you think we should have 2 shorter backlog grooming meetings every week, or 1 longer one every week. I think at the moment, a lot of changes are happening, so maybe 2 shorter ones every week are more useful. What do you think? [19:25:05] leila, ^ is this the whole team backlog grooming? [19:25:13] Or another, separate one? [19:25:49] Woops. I'm not supposed to be here [19:25:52] * halfak|Aspen runs back to aspen [20:09:25] halfak|Aspen: no. it's just the content creation work. [20:09:29] hi halfak|Aspen. [20:43:54] o/ leila [20:43:57] Just got back from aspen [20:44:07] hey halfak. [20:44:16] "content curation work"? [20:44:28] *creation [20:44:31] aspen the channel or aspen the place IRL? [20:44:34] schana: do you think we should have 2 shorter backlog grooming meetings every week, or 1 longer one every week. I think at the moment, a lot of changes are happening, so maybe 2 shorter ones every week are more useful. What do you think? [20:44:38] aspen the channel :) [20:44:41] :-) [20:44:44] halfak: article rec. :-) [20:44:47] * schana thinks [20:44:52] * halfak doesn't use his teleporter during work. [20:45:10] leila, gotcha. [20:45:11] increasing content coverage to be exact, halfak. :D [20:45:11] can I borrow it then? [20:45:34] apergos, only one downside, you need to die and let your clone take over every time you use it. [20:45:48] you got one of the cheap off-brand ones [20:45:54] :D [20:47:31] * halfak didn't know it was cool to invite schana to backlog grooming events. [20:48:04] Revscoring team does our backlog grooming on Saturday mornings to accommodate volunteers [20:48:12] leila: I think one meeting / week should be enough to cover the changes happening. If we find ourselves needing more we can always increase the frequency *shrugs shoulders* [20:48:15] But we can move it [20:48:55] halfak: I'm a software engineer; I love making decisions and arguing about them :D [20:49:32] schana: sounds good. any preference for the day of the week? [20:49:44] schana: I'm leaning towards early in the week so we can plan that week [20:50:03] that would work, leila [20:50:16] ooki, sending out the invites, schana. [20:50:22] just one last thing: 30-min, schana? [20:50:34] should be enough [20:50:42] agreed. [21:02:15] Hey folks. I just got an email asking about past work in Wikipedia. [21:02:26] Anyone know if there has been a survey about mental health issues of Wikipedia editors? [21:03:26] HaeB, guillom, EGalvez ^ [21:13:34] don't remember anything offhand [21:13:50] halfak, going to start off w/ your thanks/month on enwiki. any date range limits on that? [21:14:01] wiggins, wat? [21:14:33] on quarry, one of your saved queries is thanks/month on enwiki. nice & simple; preview results only show through 2014 [21:14:47] but no metadata on query so can't tell if that's b/c you last ran it in 2014 [21:20:31] Oh! [21:20:38] Yeah. Probably because I ran it in 2014. [21:20:45] It would help a lot if you pasted a link ;) [21:20:47] wiggins, ^ [21:20:59] right, that was my guess: http://quarry.wmflabs.org/query/216 [21:22:30] Ahh yes. Nice and simple. Looks like I must have run that in Aug. 2014. [21:22:40] I'd guess it to be about the middle of the month :) [21:22:42] Yes, I just ran a fork & it's current [21:26:26] halfak, what's the namespace encoding? 0 = article? [21:26:54] wiggins, yeah. 0 = article. Will get a more complete list. [21:27:06] https://en.wikipedia.org/wiki/Wikipedia:Namespace [21:27:28] thx! [21:31:21] halfak: nothing offhand either [21:31:44] Thanks HaeB & guillom [21:32:28] halfak, as examples we modified thanks/month to thanks/day and namespace starts w/ OMG! to WTF [21:38:03] wiggins: perhaps also of interest: https://meta.wikimedia.org/wiki/User:Faebot/thanks [21:40:26] wiggins, if you ever want to see a bunch of curses, sort page titles alphabetically. ;) [21:40:53] ^ also don't do that if you don't want to have a bunch of curses on your screen ;) [21:46:21] halfak: success! user gshahane modified query successfully. other group picked on an lvwiki query & couldn't tweak it usefully but pored through what was there [21:47:17] Cool :) [21:48:08] will be asking them to give feedback on querying system as portfolio entry for the week [21:48:30] wiggins, OK to share these notes with us? [21:49:50] halfak, absolutely! aggregated anonymous notes. I put in my syllabus that I'll do that so they know the feedback goes to you. [21:50:08] great :) [21:52:55] wiggins: halfak \o/ awesome [21:53:29] yuvipanda halfak thanks for the support! feedback to come. now prepping for guest lecture tonight... O_o [21:53:56] godspeed wiggins [21:55:55] :D [23:34:41] J-Mo: ping on email :D [23:36:28] I don't see it. [23:36:37] J-Mo: oh [23:47:58] yuvipanda: what if I wanted to run a PAWS instance overnight. [23:48:06] E.g. I want to parse an XML dump and that's going to take a while. [23:48:12] halfak: right now you can just let it run [23:48:15] and it'll continue running [23:48:17] Cool :) [23:48:33] * halfak rages at "Untitled folder" [23:48:36] heh [23:48:53] :D [23:49:55] yuvipanda, how do I access XML dumps from PAWS? [23:51:49] halfak: should be in /public/dumps? [23:52:17] Awesome! [23:52:24] halfak: same as tools basically [23:52:26] * halfak tried /mnt and /srv before asking [23:52:42] * halfak needs to work on his knowledge of tools [23:55:11] :D [23:55:51] halfak: I just showed madhuvishy how to run a webservice with interactive development on notebooks [23:55:55] it's pretty cool [23:56:05] we don't have it in PAWS yet, need to stabilize other things first [23:56:11] but I've demonstrated it's totally doable :D [23:56:25] yuvipanda, wooot! That's going to be pretty cool. [23:56:32] yeah [23:56:35] The stuff that's in our .wsgi files. [23:56:36] it's just flask [23:56:45] Just put it in the notebook :) [23:56:48] * yuvipanda nods [23:56:53] there's no reason [23:56:56] .ipynb files can't contain code [23:56:59] same way .py files do [23:57:05] and be used for all the same things [23:57:06] Then, when we can import from notebooks, you put the routes there too. [23:57:08] yeah. [23:57:21] notebooks: The Revenge of SmallTalk [23:57:25] lol [23:57:56] I also need to add PHP support to this at some point