[16:36:39] hey folks [17:17:42] Hi all. I'm trying to recreate halfaker's PAWS notebook with the latest Italian data dumps: https://web.archive.org/web/20160313160743/http://paws-public.wmflabs.org/paws-public/EpochFail/projects/headings/extract_headings.ipynb. I'm also adding the namespace field, however I'm getting this error "AttributeError: 'Page' object has no attribute 'ns'". I opened up the dataset and accessing the ns field [17:17:42] should be coded similar to accessing page id and page title as they are similarly formatted in the dataset. Here is my notebook: http://paws-public.wmflabs.org/paws-public/45876923/it_wp.ipynb does anyone have insight about how to collect the ns field? [17:18:54] I've skimmed through mwparserfromhell documentation but haven't found anything that helped me figure this out [17:19:54] *halfak [17:48:28] o/ zareen [17:48:34] Was at lunch. Reading scrollback [17:50:01] Note that the mwxml documentation says to check out mwtypes.Page. See http://pythonhosted.org/mwxml/iteration.html#mwxml.Page [17:50:14] This links to https://pythonhosted.org/mwtypes/page.html#mwtypes.Page [17:50:27] That shows you want "page.namespace" [18:02:08] o/ schana [18:02:17] hey halfak [18:02:29] I was just looking in the Scrum of Scrums pad and it looks like the ORES notes didn't make it. Maybe I'm missing something. [18:02:33] See https://etherpad.wikimedia.org/p/ResearchStaff [18:03:04] sorry, I was running late and did the meeting on the phone on the way home from dinner [18:03:16] I didn't get a chance to look at the etherpad beforehand [18:03:24] I see. [18:04:50] would asking in #wikimedia-codereview help? [18:10:44] OMG showcase is starting soon! [18:10:50] It isn't on the Research calendar :S [18:11:03] will you be posting a link to the hangout here? [18:13:13] Incabell__: the livestream link is here https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#October_2016 [18:13:27] thanks! [18:13:43] hangout link should be in Sarah’s invite [18:14:20] halfak: thanks, that seems to work! [18:15:36] I pinged Sarah to send a reminder to the lists [18:17:22] DarTar, I'd not talked to anyone about doing IRC duty, but I'm available for it. [18:20:28] Hi DarTar. May I request that Research showcase announcements get sent out further in advance, perhaps 2 weeks in advance? :) [18:20:51] Pine, that would be a lot since we do a lot of organizing work. [18:20:53] halfak: thanks :) [18:21:02] Pine: what halfak said [18:21:03] The showcases are regularly on the 3rd weds of the month. [18:21:13] they require quite a lot of work to coordinate [18:21:20] We often don't get a second speaker until the week before-hand. [18:21:21] OK, that is good to know at least as far as scheduling goes [18:21:41] also, we need finalized abstracts and titles before announcing them [18:22:06] The showcase will start in 9 minutes [18:22:16] Stream: https://www.youtube.com/watch?v=cBImUZ_si5s [18:22:16] btw I can’t attend myself today, due to a conflict [18:22:19] Or check the topic :) [18:22:26] sent a reminder via social media [18:23:06] halfak: you're live on the YT stream [18:23:20] I just announced that to the SF folks :) [18:23:25] * halfak picks his nose [18:23:54] lol [18:23:56] This is great [18:24:04] working on the audio ... [18:24:17] Honestly, I am cool with streaming these parts, but I think we should figure out how to trim them from the final video [18:24:53] ewulczyn, when brendan is done working out the audio issues, can you ask him if it's possible to restart the stream so we can trim this part out? [18:25:08] If not, I guess we can just edit the video and re-upload. [18:25:23] I'll ask him [18:25:52] yes, the video can be trimmed after the fact, we’ve done this before [18:26:01] don’t sweat it ;) [18:26:16] Looks like we're starting right now! [18:26:21] So a few minutes early. [18:26:31] Yes we are early [18:26:45] I would recommend waiting in case other people are coming from other engagements [18:26:47] -._o_.- [18:29:18] I just chimed in to suggest we wait until the right start time. [18:29:34] It seems we still have technical issues anyway. [18:29:38] So stay tuned, folks. [18:31:13] ewulczyn, once you're done with the intro, make sure to click the button to hide the avatars/thumbnails at the bottom of the page. [18:32:49] BTW, Ofer has started presenting. [18:32:53] Ping me with questions [18:36:41] Count of IP editors != Individual people [18:36:51] Not all IP edits come from people who never log in [18:38:16] My measurements suggest that IP edits are only 21% of overall edits. [18:38:40] Many IP editors do the kind of quality work that we associate with long-term editors. [18:40:42] I'm getting bad audio from Ofer [18:40:44] Anyone else? [18:40:46] me too [18:40:58] up [18:41:05] same here [18:41:13] ewulczyn, perhaps we could alert Ofer? [18:41:35] same here [18:42:22] Tech support time! [18:42:35] Sorry folks. We'll have this worked out soon [18:42:38] Thanks for your patience. [18:42:51] halfak that was a very polite interruption :) [18:44:22] All fixed! Here we go! [18:46:43] Q: Why not use a quality measure as "maturity level"? [18:51:22] Note that an edit type taxonomy built by Wikipedians is very different from that designed by Daxenberger et al. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types/Taxonomy [18:51:44] * halfak adds to his own question list [18:56:58] Getting to the end of this talk. Any questions? [19:00:17] Q: Only 1000 articles? Is inflow/outflow actually inflow or outflow? [19:01:04] ewulczyn, time check [19:04:18] ewulczyn, we're over time. save questions for after Charlie's presentation? [19:04:29] yeah, I agree [19:19:00] Any questions for Charlie? [19:19:06] halfak: I have a question for Ofer. Does his research provide any insight about (1) how users get the information they need to become productive from the very first edit, and (2) how to increase editor retention? [19:19:16] Got it Pine [19:22:24] Sorry for my verbal stumble [19:22:40] For Ofer: I may have missed something, but is there any indication that classes have changed over time? I.e., using your dataset, when you look at the data over time, would it be possible to look into whether new roles have emerged over time (or have you looked into this/are you aware of studies of that)? [19:22:44] halfak: To Charlie: Were you looking into how ro handle potential edit conflicts? [19:22:55] Thanks Pajz and Ainali [19:24:28] * halfak is going to try his best to pronounce "pajz" and "Ainali" :) [19:25:40] Yes thanks [19:26:10] halfak: It's like "finally" without the "f" [19:26:17] Nice thanks [19:26:36] Ah. Thanks! [19:28:58] I'm gonna talk abotu this: https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800 [19:29:32] typical, halfak. insert yourself into everyone's research project ;) [19:30:27] lol [19:30:31] halfak: Oh, yes I meant to ask you. How can one help in getting the quality score for ORES to other language versions. But that's not a question for this showcase :) [19:34:17] Just a supplementary note, as this question didn't come up again: the edit taxonomy used in the research that Ofer presented is different from the one described in Daxenberger and Gurevych (2012) - though still different from https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_types/Taxonomy [19:48:30] Thanks johannes__ [19:48:49] What kind of changes were made in the research that Ofer presented? [19:53:42] The taxonomy has no direct connection to Daxenberger and Gurevych (2012). It is rather based on previous work from Kriplean et al. (2008) and Arazy et al. (2010) [19:56:29] We had a larger sample of training data available for this taxonomy [20:10:50] johannes__, gotcha. Are the classes of the new taxonomy more semantic than the Daxenberger taxonomy? [20:11:08] The whole "Image-Insertion", "Template-Deletion" bit seemed very... syntactic. [20:11:48] a bit, yes. [20:13:25] Move or create new article, Add substantive new content, Delete substantive content, Fix typos and grammatical errors, Rephrase existing text, Hyperlinks (to other Wikipedia pages), References (to external sources), Add or change Wiki markup, Reorganize existing text, Insert vandalism, Remove vandalism, Miscellaneous [20:24:02] Details are in the paper: http://faculty.poly.edu/~onov/WikipediaEmergentRoles_ISR_Manuscript.pdf [20:31:27] Damn. Was going to ask if the labeled data was published anywhere. [20:36:14] J-Mo: by the way, how is the Teahouse research coming? That research strikes me as having similarities to Ofer's in the sense of looking at how co-production communities organize and sustain themselves. [20:36:50] Pine, it's blocked on halfak-time [20:36:51] :\ [20:37:02] Halfak is blocked on not having engineering resources for ORES [20:37:24] Pine true that halfak and I haven't taken the retention study back up. [20:37:50] but otherwise, it's going well. I submitted a paper to the CHI conference last month. Was focused on Teahouse host behavior over time. [20:38:15] will update the wiki once it's no longer under review [20:38:25] (whatever the outcome is) [20:38:25] OK. I'd be interested to hear what you've found so far, perhaps in a research showcase! [20:38:44] when the paper is accepted (somewhere), I will definitely do a research showcase! [20:46:14] Thanks. I'm interested in what can be learned from these peer production studies that would be applicable to LearnWiki, for both onboarding and retention.