[09:57:58] good morning research hackathon folks! [09:58:30] not sure if/where folks are congregating this AM, but I'm out in the garden [09:58:36] in case anyone wants to join :) [10:38:43] Hey ashaw. We're in the same spot as yesterday. [10:39:00] Sorry, I've been AFK talking to people about what research is :D [10:42:28] Ran a demo of some research for the Kazak students. We compared the length of pages across Culture and Science. [10:50:40] whym: hey! no, there isn't, but don't worry about garbage :) [11:05:55] halfak, sorry ;p [11:05:57] what's up? [11:06:12] That julian guy is going to send an email with his request to analytics@ [11:06:20] Which is going to be fun and also good transparency. [11:06:35] cool! [11:06:36] yay :) [11:12:04] I got names wrong. Not Julian. [11:12:44] yuvipanda: got it. thanks a lot for making this tool, by the way. [11:12:53] whym: \o/ [11:12:54] I especially like the fact that queries and results are "citable" by URL. Surely be useful in various community discussions backed up by data. [11:13:00] whym: do tell me waht else you want on it to make things better [11:13:40] phuedx: ssh -C -N -L 3309:enwiki.labsdb:3306 tools-login.wmflabs.org [11:14:09] +1 whym [11:14:15] Ironholds, look at this thing http://offwiki.org/wiki/Main_Page [11:14:28] oh god, is that wilm being...wilm? [11:14:32] Yup [11:14:40] yep [11:14:44] gag me with a '''spoon''' [11:15:00] whym: :D I'll make it possible to cite particular versions of the query / result - so you can see results exactly as they were when you looked [11:18:20] phuedx, +1 Dark Comedy [11:20:04] Deskana: can you approve my new mwoauth consumer? [11:20:27] phuedx: celery -A quarry.web.app.celery worker [11:21:56] yuvipanda: that would be great [11:22:11] whym: yeah, am already saving them as individual 'revisions', just need to expose them [11:23:14] phuedxhttps://www.mediawiki.org/wiki/Special:OAuthConsumerRegistration/propose [11:23:14] phuedx: https://www.mediawiki.org/wiki/Special:OAuthConsumerRegistration/propose [12:17:27] yuvipanda, https://bugzilla.wikimedia.org/show_bug.cgi?id=69227 [12:57:28] yuvipanda: might be worth investigating an orm now [12:59:02] yuvipanda: I don't know, I'm not sure I trust you. ;) [13:10:14] Deskana: :P got reedy to do it [13:10:32] yuvipanda: Cool. [13:10:38] yuvipanda: Where are you hiding? [13:11:15] Deskana: I'm with Max, in the room downstairs? [13:11:30] yuvipanda: Aha, Monte and I are sat in the atrium of the... greenhouse. [13:12:15] Deskana: ah, :D you're in r/outside [13:37:07] phuedx: any luck picking an ORM? [13:37:26] yuvipanda: trying out sqlalchemy atm [13:37:29] will report back [13:37:40] phuedx: cool! [13:38:51] hi Sneha! [13:39:28] or not - guess sneha_nar is gone...ah well [13:39:43] tnegrin, you around or in meetings n'such? [13:40:14] interviewing right now but @ the barbican [13:40:33] how's things? [13:41:44] they're good; I wanted to pitch something to you when you have a free moment :) [13:42:11] heh -- where are you? [13:42:54] sat with halfak and a load of community peeps and researchers at the back of the big enclosed room on 3rd [13:45:20] kk [13:58:38] yuvipanda: have users being served by sqlalchemy [13:58:56] will create a patch [13:59:05] and then move on to queries [13:59:08] and query runs [14:33:17] halfak, is there a check-in soon? [14:35:23] Yup. 25 minutes. \ [14:35:29] Unless I have done my math wrong [14:37:15] Just got into the call. [14:37:22] https://plus.google.com/hangouts/_/event/csd7v3nephec9gqq0p863dodgrk [14:46:53] halfak: do you wanna gather some folks and sit down for 10-15min at some point to talk about what to do with Quarry next? [14:59:17] phuedx: another option is to do something like https://code.google.com/p/python-sql/ [14:59:31] halfak, can you invite me to the hangout? [14:59:42] Never mind, just saw the link. [15:00:02] anyone seen heather ford in the hackathon? [15:10:02] yuvipanda: so i'll be interested to see what that complicated query looks like with sqlalchemy's orm [15:10:07] Progress: https://meta.wikimedia.org/wiki/Research:Screening_WikiProject_Medicine_articles_for_quality [15:10:21] phuedx: right [15:10:23] let me walk over [15:25:01] superm401, http://tools.wmflabs.org/ptwikis/Editor_Visual [15:25:02] :-) [15:25:43] http://tools.wmflabs.org/ptwikis/Editor_Visual:enwiki [15:27:37] http://pt.wikiversity.org/wiki/Mapeamento_REA_(Brazil_Program) [15:27:50] http://pt.wikiversity.org/wiki/Lista_de_reposit%C3%B3rios_de_recursos_educacionais_dispon%C3%ADveis_online [15:29:47] http://tools.wmflabs.org/ptwikis/Linha_do_tempo [15:35:05] That timeline page is awesome. Would love to hear more about how it works behind the scenes. [15:42:46] it was hard to hear it but i think we got that some global collaboration is going to happen :-) [15:55:04] halfak,you froze [15:55:28] Yeah. Trying to reconnect [15:55:30] Arg! [15:56:42] halfak, lost you. [15:57:31] yuvipanda: i'm fueling my hacking + sqlalchemy learning with metal [15:57:42] phuedx: haha :D nice [15:57:55] Good morning everyone [15:58:01] Hi milimetric [15:58:01] Looks like it is network issues. [15:58:02] when i get something right then i rock out [15:58:02] Who all is here for the research hackathon? [15:58:09] o/ [15:58:09] hi Pine! [15:58:14] halfak, yeah, I think it might be on your side. [15:58:56] I think I can finish in IRC. [15:58:56] Current state of the world: [15:58:56] 1. db.page.move(page_id, new_ns, new_title) [15:58:58] 2. db.logging.store("move", "move", new_ns, new_title) [15:58:59] Proposed state of the world: [15:59:15] 1. page_move = PageMove(page_id, new_ns, new_title) [15:59:15] 2. db.apply(page_move) [15:59:15] Make sense? [15:59:25] Totally on my side. [15:59:27] Stupid conference WIFI [15:59:38] halfak, yeah, I get the idea. [15:59:46] right, I see halfak [15:59:51] halfak: yes [16:00:17] sorry if you covered it before, halfak, but I was very interested in WikiCredit [16:00:57] Oh! :) The thing I'd like to work on in the short term is to get WikiCredit listening to recentchanges with mwevents. [16:01:40] There's a parallel thread which is to do some background work on measures of importance. I was going to do pageviews, but I think that looking at internal links (pagelinks table) might be a better proxy. [16:01:59] halfak: I am interested in Wikicredit as well, but first things first [16:02:08] I suppose there is also another thread for imagining visualizations for wikicredit. I have mocks. [16:02:15] Someone locked #wikimedia-labs2 instead of redirecting it to research. Can that get fixed? [16:02:16] What's up Pine? [16:02:27] It should be redirecting. It isn't? [16:03:02] halfak: I'm happy to help hack either of those things, feel free to just make me a black box and assign me something [16:03:29] I'll grab lunch now, but will get to hacking after [16:03:39] I am interested in Wikicredit as well so this sounds good [16:04:03] halfak: the message I got from -labs2 was "This channel is invite-only. You must have an invite from an existing member of the channel to join." [16:04:44] Pine, it should say that, but then redirect. [16:04:47] Let me test. [16:04:55] I just re-set the forwarder [16:05:04] I just reset the forwarder superm401 [16:05:07] Works for me. [16:05:11] good [16:05:12] Thanks, halfak [16:05:21] I'm googleing about how to make it stick [16:05:35] WIFI disagrees [16:05:50] hm [16:06:00] do I get that message because I'm already in -research? [16:07:33] By the way, hi leila, ragesoss, Thehelpfulone and whym [16:07:59] OK. I think I got it. I'l going to run a test. BRB [16:08:03] And hi nuria [16:08:47] hi Pine [16:08:49] :-) [16:08:53] Work@! [16:08:57] *Works! [16:08:59] Yay [16:09:04] +1 wikicredit for halfak [16:09:09] Woot [16:09:31] OK. So, some people wanted to chat about wikicredit. [16:09:43] And my WIFI is bad, so we'll need to use IRC. [16:10:16] Let me see if there's anyone in the hangout [16:10:26] irc works fine [16:10:27] I can get hangout to work on my phone but not my computer [16:12:38] Looks like I'm the only one in the hangout [16:13:03] Pine, we just had a sync session. [16:13:09] hi Pine :) [16:13:14] Luckily I made it around the London people before the connection tanked. [16:13:44] OK. So milimetric, what part of wikicredit gets you excited? [16:14:05] There's one part of the system that I haven't considered carefully actual WikiCredit subcomponent. [16:14:17] The website that will present graphs and stats to the user. [16:14:43] Diagram: https://meta.wikimedia.org/wiki/File:Content_persistence.system_architecture.diagram.svg [16:14:56] all of it is interesting halfak, so anything that you think will help the idea advance [16:15:24] Right now, I'm waiting on some primers to get the diffengine part of the system loaded. [16:15:53] The persistence system is next in the flow of data. It will use text difference to build stats about the content that people add to the wiki. [16:16:07] e.g. how many revisions their words persist. [16:17:01] The wikicredit subcomponent will query the persistence system to generate stats for a user's work and store it in a cache to be updated. [16:17:10] right, i've seen some visualizations of the history of content on individual pages, that's the kind of thing you're going for right? [16:17:17] So, it will need a work queue. [16:17:21] except quantifying not visualizing [16:17:41] +1 [16:17:45] I' [16:17:57] 'd like to have a visualization, but not on a per-page basis. [16:18:12] * halfak goes to upload a new mock. [16:18:39] Hm [16:18:44] why not on a per-page basis? [16:19:06] the idea is to visualize a person's total contribution right? [16:19:27] as modeled by quantity and popularity [16:20:52] https://commons.wikimedia.org/wiki/File:WikiCredit.value_added_mock.svg [16:21:39] As the desc says, the bars would show value added per short time unit (day? month?) and the area under the curve would represent total value added. [16:22:11] The system would need to identify users, present a graph based on a cached data and a work queue that a user starts. [16:22:57] halfak: are you including edits made to all projects and all namespaces in a user's wikicredits? [16:23:20] I was imagining the main NS first, but I'd like to find a way to incorporate templates [16:23:49] IMO you should include all namespaces, or have options for showing them and adding them separately [16:23:50] Pine: I think it's "Credit" as in "attribution of work" not "it takes 20 credits to play this video game" :) [16:24:12] For example people's contributions to AFD discussions should be credited [16:25:13] I would credit everything although allowing for users to select which namespaces are calculated [16:25:35] and preferably include all projects, accounting for the upcoming SUL finalization [16:26:06] Pine, other namespaces aren't viewed. We would miss out on measures of importance. [16:26:31] BUT I agree that there is /valuable/ work done outside of main. [16:26:31] How would we miss out on measures of importance by including all namespaces? [16:26:53] Well, the measures of importance I am looking at right now are view rates and internal links. [16:27:07] yeah, I see your point halfak, it makes sense [16:27:12] With the assumption that articles that are highly linked from other articles are important (needs checking) [16:27:23] And that articles that are highly viewed are important [16:27:27] milimetric, cool. [16:27:45] halfak: well... importance and popularity are not the same [16:27:52] halfak: hm, PageRank for wiki? :) [16:28:01] So, I only have 20 minutes before the Wikimania opening stuff starts. I'm hoping that I we can figure out something for you guys to hack on before I go. [16:28:13] yea, that'd be great [16:28:21] milimetric, I was just looking at internal links, but a PR based approach would be cool to explore if you'd like to :) [16:28:45] when I said "internal links", I meant a count from the pagelinks table. [16:28:48] halfak: you mean you were just counting the internal links instead of recursing? [16:28:52] +1 [16:28:53] gotcha [16:28:57] hm... tempting! [16:29:05] But that would be a cool contribution to the work. [16:29:22] ok, I'll waver between that and visualizations and decide over lunch [16:29:25] *and* we could use a better understanding of link network structure for other analytics work. [16:29:30] Cool! <3 [16:29:38] thanks halfak! [16:29:51] halfak: it's at 7pm, not 6 [16:29:53] the opening [16:30:54] Yes, registration is what's starting at 6 [16:31:15] "welcome reception with food and drinks" [16:31:39] Pine, I'd like to entertain other measures of value -- outside of direct article edits. [16:32:16] I wonder if you could help develop some thoughts about measuring value elsewhere. [16:35:14] hmm, sure [16:35:24] milimetric and I can brainstorm about how to measure importance [16:35:34] it is a difficult problem, I grant that [16:36:51] I think article readership is a relevant measure, but only one [16:37:16] Internal links directing to an article, similar to Google, are another [16:37:45] Also, as the WikiCup does, we can look at how many languages an article exists in to measure importance [16:38:15] milimetric: also, yurik sent patches to the Limn MW extension to make it much better [16:38:17] We could also attach value to ITN/DYK/GA/FA [16:38:34] Same for FL/FT/FP [16:40:19] oh, if I may brag, a Wikipedia article I started is now the top Google search result on the subject :) [16:41:44] congrats, Pine :) [16:42:22] I don't know if we could include Google search ranks, but it would be nice if we could account for that [16:42:46] Pine: best problems are the difficult one. Simple problems are boring. [16:42:49] thanks yuvipanda [16:42:55] I'm very proud of it [16:43:10] Pine +1 for lang links. [16:43:26] One thing I'm concerned about with links is that they don't work very well for under-developed articles. [16:43:37] I wonder if views would have the same problem. [16:44:26] Oh yeah! One other thing. I was talking to Technical_13 a few months ago about strategies for measuring the value of template work. He might have some insights. At the time, we didn't get too much farther than counting transclusions. [16:45:51] halfak: that's why I said links *to* not links *from* [16:46:08] That also makes it harder to game the system [16:46:25] +1 [16:46:35] As far as templates go, that's an interesting point about transclusions [16:47:32] Oh, here's another measure for articles and templates: number of unique editors [16:48:08] Of course that needs to account for subtracting accounts that are caught socking [16:48:17] * halfak has to run. [16:48:23] Will check logs later. [16:48:24] o/ [16:48:24] Bye! [16:49:01] milimetric: what are you working on now? [16:49:47] Hm, more ideas for measures of value: [16:50:05] check for warning templates [16:50:32] Pine: have you seen quarry.wmflabs.org, btw? [16:50:40] check for number of talk page editors [16:50:54] yuvipanda: I know of it but I haven't started playing with it [16:51:14] another article measure: number of page watchers [16:52:21] What's the topic? [16:57:29] Qcoder00: we're talking about WikiCredit [16:57:43] Which is? [16:58:26] https://meta.wikimedia.org/wiki/Research:Ideas/WikiCredit:_Measuring_value_added_to_Wikipedia [17:04:18] holaaa [17:06:06] * yuvipanda waves at nuria [17:06:12] nuria: everyone's moving off here at wikimania [17:06:21] i to teh reserach channel? [17:06:26] *the [17:06:32] boy is that hipster [17:08:17] heh [17:08:39] Yes, the Wikimaniacs are going to party while the rest of us are hard at work! [17:09:13] I suppose we can make do without them [17:09:37] However it is hard to have a research hackathon with so few people [17:11:40] Pine: yeah, but oh well :) [17:11:48] Pine: maybe take the time to check out Quarry :D