[14:05:17] hi there [14:05:53] Hey FFloeck! [14:06:07] It looks like the west coast isn't up yet, so we don't have Pine. [14:06:15] BUT we could discuss the IEG briefly. [14:07:00] sure [14:07:49] so we are doing these network visualizations anyway [14:08:03] and it just fits with the description as far as i can see [14:08:25] just not sure how we go about "intigrating" that into the project proposal [14:08:53] So, one of the things we are hoping to do is gather a set of datasets with a common format. [14:09:18] This way, people will be able to perform analyses that, for example, take into account both talk page interactions and article editing interactions. [14:09:39] So, the contribution of a dataset would be most welcome. [14:09:54] We'll need to talk about standard formats to make processing this data easier. [14:10:11] ok, so what we have is a adapted version of the wikiwho algorithm that actually extracts all the interactions between editors [14:10:30] Sure. Makes sense. [14:10:38] I mean I could just generate a dataset out of that [14:10:44] Can you write up a description of the different interaction types? [14:10:46] +1 [14:10:55] sure [14:11:22] A post on the talk page of the IEG would be most appreciated too. [14:11:25] ok [14:11:35] The IEG committee uses that type of feedback/interest to help make decisions about grants. [14:12:08] but I'm not sure I understand the part with the datasets: I could produce some sample output for a couple of articles, is that what you mean? [14:13:20] Oh. I was imagining producing complete historical information for all articles. [14:13:25] And then hosting that somewhere. [14:13:34] ok, that is a LOT of data [14:13:36] I'd also like to make the dataset relatively easy to update. [14:13:38] Yup :) [14:14:08] but if for example our API was fast enough, would it not make sense to let people query for these? [14:14:14] So it would be nice if we could built it by appending to text files so that using a big, shared NFS disk is performant. [14:14:35] hm i see [14:14:39] FFloeck, maybe if you provide the right indexes. I'm not in the game of trying to forsee other people's indexing strategies. [14:14:46] sure [14:15:02] I'd rather just give people the data and let them decide how to process it. [14:15:16] (providing an API is a huge plus though) [14:16:42] halfak: Good morning :-) [14:16:51] Hey Helder! [14:17:02] You're a freaking celebrity in ptwiki :P [14:17:08] hahahah [14:19:21] sorry, was afk. so what data format/structure would you like then? and only the relations between editors or more info, like the authorship stuff etc? [14:19:23] halfak: I was trying to run the example you provided on your "Revision-Scoring" repo, and then I noticed I was missing a few dependencies: https://github.com/halfak/Revision-Scoring/pull/1 [14:19:39] this might help others too [14:20:32] FFloeck, we haven't gotten to proposing one, but I was imagining [14:20:51] Thanks Helder. Will scope that out right away. [14:21:02] * halfak <3's pull requests [14:21:51] >-) [14:21:53] :-) [14:22:02] ok, I'll have to run, but I will send you guys an example data set with a proposed strucuture and you can tell me if you like it [14:22:15] Ahh! We can add those to the requirements.txt or setup.py. That's the standard way to express requirements in python land. I'll use your pull request to set one up. [14:22:37] Thanks FFloeck. :) [14:22:39] * halfak is stoked [14:22:42] np :) [14:23:02] and for the wikiwho API thing I'll let you know when it runs as it should [14:23:07] cu [14:23:20] great. I'll have some users for you right away. :) [14:23:25] o/ [14:25:27] aaagh [14:25:28] * Ironholds headdesks [14:25:38] the stat1002 upgrade removed /all of my python and R modules/ [14:26:18] heh. That' [14:26:21] ll happen [14:26:40] hokay, that's weird [14:26:42] * Ironholds headscratches [14:26:51] the module EXISTS, it's just bro-oh. flip. [14:35:18] halfak: so, I foudn this: [14:35:19] https://github.com/halfak/Mediawiki-Utilities/blob/7de219333ed51f026a6f2ab84b0164fb86275954/setup.py#L20 [14:35:30] does it mean this module also requires "nose"? [14:35:35] Yup [14:35:46] And it is missing from the requirements. [14:35:51] that! [14:35:53] Though, it might be good that it is. [14:35:56] * halfak debates. [14:36:01] nose is the testing framework. [14:36:13] You don't need the testing framework to install and run. [14:36:42] shouldn't that happen for Revision-Scoring too? [14:37:11] I mean, a user should be able to run the example without the test suite [14:37:16] Yes. I'm not sure of the Right(TM) thing to do here. [14:37:31] I think that dropping nose from requirements is the right thing. [14:38:23] I was getting "ImportError: No module named 'nose'" before installing it [14:39:09] Interesting. You were getting that in running an example? [14:39:11] but aparently it was from here: https://github.com/halfak/Deltas/blob/master/deltas/util.py#L1 [14:39:17] yep! [14:39:33] hah I think I know what it is [14:39:52] Hmm... It seems that I am including the testing utils with the regular utils. I better drop that. [14:39:58] I copy-pasted the example in a file named test-revision-scoring.py [14:40:27] and the framework seems to consider the "test" in the name [14:40:48] https://pypi.python.org/pypi/nose/1.3.4 [14:40:56] Yes it does [14:41:21] But I still need to fix the problem of nose appearing in the utils of deltas. [14:42:16] halfak: do you want a https://github.com/halfak/Deltas/issues/new ? [14:42:32] yes please [14:52:59] halfak: this was a typo right? https://github.com/he7d3r/Revision-Scoring/commit/34d20aa1c0911a5516b4a85d6c3c04693cc6a970 [14:54:17] * halfak runs last test in deltas [14:54:37] oh. yes. That lang file was never really finished. [14:54:43] You caught me copy-pasting from english :) [14:55:06] heh ... [14:56:40] New version pushed to deltas [14:57:01] halfak: what do you prefer a pull request for that now or wait until I finish polishing the list I started on https://gist.github.com/he7d3r/1285f6b52e2782d96b9e [14:57:07] ? (probably not today) [14:57:30] Either way is fine. [14:57:47] ok... I think I'll do it now so I don't forget [14:58:18] Hokay [14:58:26] https://github.com/halfak/Revision-Scoring/pull/2 [14:58:50] everyone enthusiastic for the tutorial at midday? :D [14:58:59] that tutorial? [14:59:03] *What [14:59:20] (it is midday here, already) [15:00:19] Ironholds is going to show off his data processing library for R [15:01:37] data processing library, psht. It is a WMF swiss army knife. [15:01:43] it slices! it dices! it geolocates! [15:03:50] Ironholds: where is that going to happen? [15:05:29] Helder, research staff meeting. I'd make it more transparent but frankly it's not tremendously useful for people who don't have stat2 access [15:05:37] most of it is IP geolocation, requestlogs parsing, hive querying, etc. [15:05:59] ok :-) [15:06:30] #wikimedia-rcom is not closed yet :( [15:06:58] Nemo_bis, I've been pinging DarTar. he holds the keys [15:11:32] o/ Nettrom [15:11:47] o/ halfak [15:12:02] * Nettrom is now running Ubuntu 14.04 [15:12:08] Woo! [15:12:13] hey, at least I can SSH to WMF servers then! [15:12:47] And SCP directly! -- well, via a proxy, but practically directly [15:13:17] yeah, transparent to me, which is nice [15:13:26] halfak: you can always ask the IRC GC :) [15:13:48] https://meta.wikimedia.org/wiki/IRC/GC [15:22:10] Nemo_bis, if you wanted to make a proposal, I'd support it. [15:37:44] halfak: there's no need of a proposal, just ping them on IRC [15:38:15] I already did so in the past but they want the request to come from a person with a semi-clear "ownership"/attachment to the channel being closed [15:38:25] Oh. [15:38:27] Dang. [15:38:33] * halfak was hoping he wouldn't have to do anything [15:38:48] Meh. We should wait for DarTar. [15:39:10] I'm cur that these guys would rather we worked with the channel owner. [15:39:13] 8sure [15:39:20] typing is hard this morning [15:39:24] * halfak drinks more coffee. [15:46:30] Hey Nettrom, did you get to that flow stuff we talked about yesterday? [15:46:45] If not, I'm looking to bang it out right away. [15:47:16] * halfak is not sure he found the right colloquialism there [15:47:42] Nope. looks like I'm good: http://www.urbandictionary.com/define.php?term=bang%20it%20out&defid=5499936 [15:48:52] yep, correct colloquialism [15:48:54] very British, I think [15:52:31] Oh no! [15:52:42] * halfak is going to lose his 'merca cred [15:52:59] I need to go shoot a gun from a monster truck. BRB [15:54:55] hah! [15:55:02] just finish it in ~30m [15:55:18] Nettrom, I just noticed something weird about user_edit_stats.tsv [15:55:30] It has 10x as many rows as it should [15:55:42] * halfak is looking into it. [16:00:37] halfak: can you please op me up in that channel so I can forward the other one? I can't forward as this channel is not set +F :) [16:01:03] Barras, op you in this channel? [16:01:33] yeah, op me like that ;-) [16:01:38] that should worl [16:01:41] work* [16:01:54] it did :) [16:01:59] thanks halfak [16:02:28] Can you add me to the invite list on #research-rcom so that I can talk to the few who are still there? [16:02:44] Barras, ^ [16:03:35] mh, I'd rather kick off all people in there as the channel is now closed ;) [16:04:46] Sounds fine [16:24:22] Barras, I forgot to say thanks. [16:24:26] Thank you for your help. :) [16:24:44] np :) That's "my job" ;-) [16:30:03] halfak: sorry, was out to get coffee... haven't gotten around to the flow stuff yet [16:30:18] halfak: do you want me to push my files to github? [16:30:49] Yes please. I just pushed changes from yesterday, so you might need to rebase. [16:31:04] I'll get right on it [16:33:14] Thanks! [16:38:00] halfak: pull request sent [17:09:39] halfak: is this¹ supposed to be a complete list of Portuguese badwords lists? [17:09:39] ¹https://github.com/halfak/Revision-Scoring/blob/master/WORK_LOG.rst#sunday-sept-21st-2014 [17:09:50] Should the other lists be added there? [17:09:59] That list should be replaced. [17:10:18] Oh woop.s I was confused. [17:10:25] I was just keeping that as a note for myself. [17:10:32] hmm [17:10:40] I like to dump things in the work log to remind myself to look at them later. [17:25:01] halfak: have you ever submitted changes to the Snowball stemming algorithms? [17:25:20] I'm trying to understand this: snowball.tartarus.org/algorithms/portuguese/stemmer.html [17:25:48] but I think there are some errors (things which are not really for Portuguese, but for Spanish) [17:26:18] and it seems this description is used as a basis for the algorithms which are in nltk (in python) [17:27:08] so, in case there are errors, should the fixes be sent to nltk directly? or first to snowball and then ask nltk to import a new version (or something like that) [17:29:15] https://pt.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Huggle/Config#Previs.C3.A3o how disappointing. It's almost all English. :( [17:30:26] Emufarmers: it was worse not a long time ago :( [17:30:41] I suspect we didn't know about that feature [17:31:45] Emufarmers: if you are looking at that as a source of badlists, there are a few other places where you could find more comprehensive lists [17:32:15] We still need to syn all those lists... =/ [17:32:30] *badlists -> badwords [17:33:17] Hmm [17:33:20] Also, hi [17:35:05] https://pt.wikibooks.org/wiki/Utilizador_Discuss%C3%A3o:He7d3r Nemo_bis is everywhere [17:37:07] well, that's easy, when using MassMessage to write to 500 sysops at a time [17:38:17] Nemo_bis: is that change done already? [17:38:22] Helder, I'm not clear in the process, sorry [17:38:22] (didn't check) [17:38:35] what's hard is being thanked for the messages (which happens!) [17:38:45] Helder: yep [17:39:24] Helder: which version of snowball? elasticsearch has some too [17:40:47] I'm not sure [17:41:01] it is part of nltk (3.0.0) [19:03:39] halfak, DarTar: this formWizard is what I've been doing (instead of research) for the past 6 months. [19:03:57] J-Mo: heh, I realized that [19:04:15] It looks cool. Does it seem like it's had an impact on the quality or type of grant proposals? [19:04:32] Seems like it might help in a lot of ways. [19:04:52] Not sure yet. It definitely *seems* to have contributed to the recent flood of Ideas, tho [19:05:09] J-Mo: quarry outage fixed, btw [19:05:27] and it helps structure the input, so we can categorize things based on infobo x params much easier. [19:05:35] thanks, YuviPanda@ [19:06:33] * YuviPanda wonders if he can ask for an IEG [19:06:34] probably not [19:07:10] * halfak likes to think that he threw a good sized bucket of that flood. [19:08:00] YuviPanda, you could find a collaborator to fund with an IEG. [19:08:44] ooh [19:08:53] do a systemic bias project with fleabite [19:08:55] it's perfect. [19:09:02] heh [19:09:03] * Ironholds purses fingers. [19:09:23] * YuviPanda pats Ironholds [19:09:29] leila, halfak, DarTar: the buggy functions in WMUtils? Now no longer buggy. [19:09:40] Want me to send around a tar.gz to save some time? You can try it, break things, report the broken things. [19:09:47] +1 Software engineering creds for Ironholds [19:09:52] hah [19:10:01] I even have semantic version numbering! [19:10:06] incremented 0.0.2 for the fixes. [19:10:10] nice, Ironholds [20:04:11] lzia, looks like there is no call for the AB testing methods meeting. [20:04:16] ewulczyn_, ^ [20:04:20] halfak, there is no room either [20:04:25] give us few minutes please [20:04:29] Sure [20:05:32] halkak: I messed up and did not get a rooom. I'll add once we have a place. sorry. [20:05:47] No worries. Happens all the time :S [20:08:42] halfak, you should have a link for the video [20:09:04] There are basically no rooms. It's a situation we have here. :-\ [20:09:16] Stupid metrics meeting days [20:09:31] Are any of the phone rooms on 6th big enough? [20:12:43] all 6th floor is booked, too [20:12:52] and phone room is too small for the four of us [21:03:59] Ironholds: running a few mins late [21:04:10] still with ewulczyn_ [21:04:14] DarTar, k [21:18:29] Ironholds: I’m with you in 2 [21:18:38] kk [21:25:26] * halfak gains a level in _Getting work done during meetings_ [21:25:33] well done, hacl [21:25:34] err [21:25:35] halfak: [21:25:41] plots --> commons [21:25:44] :D [21:25:59] Say YuviPanda, did you see my email about the revscoring IEG? [21:26:06] I did [21:26:12] was jetlagged, so didn't get to read it [21:26:31] No worries. When you have time. [21:26:47] Was hoping to send you to the IEG proposal to +1 it. Also to give a status update. [21:27:22] ah, right [21:27:37] I'd have more time once my jet lag clears up, back to India [21:27:54] You're in India right now? [21:29:04] halfak: yeah [21:29:39] halfak: recovering from travels a bit, etc. [21:29:50] but my biggish refactor of icinga in prod is almost complete! [21:31:30] Woot. I wish you a speedy jet lag recovery. [21:55:49] Say YuviPanda, can I borrow your eyeballs quick? [21:55:53] Compare https://meta.wikimedia.org/wiki/Research:Index [21:55:59] To https://meta.wikimedia.org/wiki/Research:Index/Sandbox_splash [21:55:59] * YuviPanda clicks [21:56:06] Which one is better. (Other thoughts welcome) [21:56:28] halfak: I like the first one [21:56:42] halfak: excpet for some of the colors. headings are a bit hard to read [21:57:18] You're right. I should fix that before I have the next person look. [21:57:31] yeah [21:57:45] halfak: it should also have a title that's not just Research:Index [21:59:19] Oh? Maybe like Research:Main or something like that? [21:59:25] Research:Home [21:59:47] halfak: Research:Home feels better, yeah [22:05:42] I like the new design. hey halfak, do you want to use the FormWizard for proposal creation on Research:Home? [22:06:29] J-Mo. This sounds like a good idea. [22:06:40] I assume this wouldn't be too hard since it is already on meta. [22:07:04] * halfak needs to obfuscate which design is newer when asking opinions. :S [22:07:54] yeah, and it's pretty configurable. I can set it up for you on the Sandbox_splash page as a demo, and we can hack on it over the course of a week or so. [22:08:12] confession: I've been wanting to set this tool up for Research since the very beginning. [22:08:35] That makes me want it more -- since I know you have been considering this use-case all along. [22:08:42] yyyep [22:09:06] I'd like to start testing it out. I was planning to have a separate page for "creation" of these documents. [22:09:24] Right now, the sandbox'd example links to L2 stuff, but I was thinking something like that. [22:09:52] E.g. https://meta.wikimedia.org/wiki/Research:Labs2/New_idea [22:10:26] how about I set up a button on Sandbox_splash that triggers a page-creation workflow. For testing purposes, the pages will be Research:Index/Sandbox_splash/My great research page [22:10:45] (the pages created through the workflow, I mean) [22:10:50] That'd be great. I suppose I could move it around afterward. [22:11:01] yep. it's a simple config change. [22:11:06] Cool :) [22:11:28] the one thing I don't know is exactly how it will work with the existing research project infobox. But we can experiment. [22:11:42] The infobox can adapt too. [22:12:21] sweet. this'll be fun. Let me see how much I can get done today, and hopefully I'll ping you tomorrow with stuff to test. [22:12:46] Woot! [22:12:49] <3 [22:17:27] ewulczyn_: this is perfect, can I send this to Legal straightaway? Asking them to set up a short check-in if they wish to talk this through [22:18:17] DarTar: yes [22:18:31] k cool [23:30:30] halfak: did you get your diffing code done? [23:30:45] * YuviPanda was considering building a simple thing that takes XML dumps, generates diffs, and makes them available in a nice format [23:32:13] ewulczyn_: running a few minutes late