[00:00:43] this morning there was a meeting to start looking into indexing wikidata, https://www.mediawiki.org/wiki/Wikibase/Indexing tracks the current info on that [00:01:00] looking into distributed graph databases [00:01:50] you probably have experience / interest in that space as well, so please chime in [00:18:42] leila: ^ [00:18:49] (indexing wikidata) [01:59:31] Hi! I continue to experience very slow queries on the tool labs servers -- is there any load monitor or anything? I suspect my queries are actually fast, but I am being queued, but I'd like to verify that assumption [02:01:17] how are you making your queries? [02:05:38] well so one example is this: select sum(page_counter) from page where page_namespace=3 [02:05:51] um. [02:06:01] the page_counter field isn't used on WMF wikis [02:06:01] both slow via SSH-login => sql enwiki, and via SSH-configured SQL client [02:06:14] well, [02:06:26] enwiki has a few million pages. [02:06:33] I can tell you the result is going to be 0 though. [02:06:50] page count data is available at https://dumps.wikimedia.org/other/pagecounts-raw/ [02:07:04] http://stats.grok.se/ has a JSON output as well [02:07:17] that being 0 was actually what I wanted to know, thanks :) [02:07:32] doesn't change the overall impression that trivial queries are slow [02:07:40] it's not a trivial query [02:07:42] is there some kind of queueing going on? [02:07:59] you're scanning a good chunk of the page table. [02:10:49] i just cancelled that query after running for 20min. . . [02:12:27] we've some queries that run for days, I think [02:12:32] page / revision tables are really... large [02:13:14] * YuviPanda goes to sleep [02:17:56] damn :/ [02:18:47] any recommendation how to run those long queries? can temporarily write query results to the tools server and NOHUP the query? [02:18:57] *can I [02:24:54] need to leave now, unfortunately -- thanks for your help, @legoktm and @YuviPanda|zzz ! [15:31:14] o/ Helder [15:31:22] o/ ToAruShiroiNeko [15:31:22] hey halfak ! :-) [15:31:26] We have feedback https://meta.wikimedia.org/wiki/Grants_talk:IEG/Revision_scoring_as_a_service [15:31:32] ola! ToAruShiroiNeko ! [15:31:34] It's basically "this is awesome" [15:31:40] :) [15:31:58] ola halfak [15:32:09] Hey Ironholds :) [15:32:12] moar feedback! :-) [15:34:33] oh? [15:36:03] :) Ironholds the IEG's are turning around with scores this morning. [15:36:11] awesome! [15:36:15] good luck :D. [15:36:19] We got good scores for funding the Revision scoring as a service project. :) [15:36:36] yeeeeees! [15:36:46] so that's your Next Big Thing? Gonna be fun. [15:37:07] I'm going to get these datasets released and write "Cultural Variations in Online Behaviour" [15:37:14] to be submitted to who the hell knows. [15:38:39] Oooh. I suspect that our temporal rhythms strategy will be an interesting lens to cultural variations. [15:38:46] at least sessions will :) [15:39:11] yuuup! [15:39:26] plus circadian work, plus the weird stuff around mobile-versus-desktop DarTar saw in the pentaho data. Gonna be fun on a bun. [15:39:46] (CSCW 2016? The submission deadline for that hasn't even opened yet. I have no idea, y'all know this space better than I0 [15:39:47] *) [15:40:30] halfak: did you see my comment on [15:40:30] https://meta.wikimedia.org/w/index.php?title=Grants_talk:IEG/Revision_scoring_as_a_service#.22provide_us_with_a_random_sample_of_hand-coded_revisions_.28as_damaging_vs._not-damaging.29.22_-_Gesichtete_Versionen.3F [15:40:31] ? [15:44:13] * halfak clicks. [15:44:50] +1 clear and to the point. [15:46:00] great! [15:47:20] halfak: I was wondering if it would be possible to give a random set of revisions to some users (me included), for evaluation as good/bad, even before it can be used for training [15:47:41] then, when we get to that point, we would have some data available already [15:47:56] It's certainly possible. [15:48:04] I agree that doing both in parallel would be desirable. [15:48:53] We can train models on "will be reverted" or "will be flagged as bad" to work with our feature set too. [15:51:51] halfak: maybe a stupid question: if we have a "random" set of revisions A, and later we create another "random" set B, will the union (A U B) be random? or there is maybe some "randomness property" which is lost when doing this? [15:52:19] It will be stratified in a way. [15:52:32] Unless we limit the bounds of what revisions can be selected in the same way... [15:52:33] is that a technical term? [15:52:37] stratified [15:52:43] Yes. Stratified random sample. [15:52:51] * Helder checks Wikipedia [15:52:53] So, let's say we sampled revisions from 2014 this year. [15:53:00] And then sampled from 2015 next year. [15:53:20] We'd probably want to train a classifier on each and use a test set from each to see if we learn different parameters. [15:53:29] If not, I don't see why they couldn't be merged. [15:53:46] However, I look forward to checking. [15:53:53] makes sense [17:08:13] leila, is there a breakfast with Lila this morning? [17:08:31] (also HI! & good morning) [17:08:34] :) [17:08:44] hi halfak. no. if there were one, it should've been in the calender. [17:08:49] I /think/ it's every other week [17:08:56] Ahh. Makes sense. [17:20:45] good! [17:21:11] I'm gonna propose we move them [17:21:23] Move the meeting? [17:21:34] Or the breakfasts [17:23:44] the former [17:23:47] will discuss! [17:42:47] Hi gang! Was referred here by #wikimedia-labs. I'm working on a project. The goal is to visually compare contributions to wikipedia by university -- let's see which institutions contribute the most to the most prominent ed tech resource around [17:43:50] Is there an API or other resource that would allow me to download the IP addresses associated with all edits to wikipedia? I'm hoping to compare that list with known IP ranges for universities [17:44:56] I would like to download *just* IP addresses of the anon edits, not the article content [17:45:06] Any help very much appreciated :) [17:45:07] I don't think there are any dumps of just IPs [17:46:16] https://dumps.wikimedia.org/enwiki/20141008/ 2014-10-10 05:31:45 done First-pass for page XML data dumps [17:46:18] those maybe? [17:46:21] "These files contain no page text, only revision metadata." [17:46:25] and you just want the metadata [17:51:23] @legoktm investigating [17:59:33] nicholas, of course, a raw count doesn't tell you if the edits were accepted or any good ;p [18:01:01] tnegrin, scheduled a meeting right after this to talk datasets - your calendar says you're around [18:01:11] ok [18:07:29] @Ironholds true. I'm new to hacking data like this -- participating in an art hackathon @ RISD in Providence, RI, USA this weekend. I presume there's some way to see if the edit was accepted.. will have to figure lots of things out. Open to tips [18:08:01] Hi Nicholas. I have datasets for you. :) [18:08:19] http://datahub.io/dataset/english-wikipedia-reverts [18:08:25] :D [18:08:39] This lists out all of the revisions that were reverted in English Wikipedia up until Aug 23rd, 2014. [18:08:55] You can compare these datasets to your queries to filter out rejected edits. [18:09:11] wonderful ! [18:09:47] If you give me a date range and a set of IPs, I might even be able to simply pull the data for you. [18:19:53] @halfak I'll get back to you on that. figuring out exactly what i can get from that resource ironholds linked to [19:04:13] brb researt [19:14:59] halfak, I need some help [19:15:09] What's up? [19:15:32] we'll send you the streaming link, can you share it with wmfall, analytics, wikiresearch-l? [19:15:44] Yes. [19:15:49] (setting up AV always surprises us! ;p) [19:15:51] thanks! [19:16:05] Also, could you invite me to the call? I'd like to do IRC questions this time :) [19:17:02] okay, halfak, there you are: https://www.youtube.com/watch?v=-FQ-TtTCdJo [19:17:14] you can just reply to the older emails on those lists about the same event [19:27:20] Hey folks. For those of you who came early, the streaming link is tps://www.youtube.com/watch?v=-FQ-TtTCdJo [19:27:25] woops [19:27:28] https://www.youtube.com/watch?v=-FQ-TtTCdJo [19:27:41] We'll be starting shortly. [19:29:22] halfak, can you ping the staff channel? [19:29:25] I can't connect to it. [19:29:30] Yes. Thanks for the reminder. [19:29:30] :-\ [19:29:39] thank you! [19:33:29] halfak, you're on IRC, right? [19:33:35] yup :) [19:33:47] I'll have to take care of the streaming so I won't follow IRC discussions [19:33:49] thanks! [19:34:04] Happy to be on point :) [19:34:23] Could you guys turn on the mic so that I can hear the plan? [19:35:03] And we are live. The stream should pick up in just a moment. [19:36:09] Stream is up [19:36:11] https://www.youtube.com/watch?v=-FQ-TtTCdJo [19:36:19] got stream thanks [19:39:33] That looks log-normal to me [19:40:19] ditto [19:41:03] Same basic assertion still stands though. [19:41:18] today is gonna be a long day >.> [19:41:22] Lots of people do nothing, a few people do lits of stuff [19:41:24] :P [19:41:36] lol [19:41:55] Atheists vs. Christians in lending. [19:42:33] Ha! [19:42:37] and Atheists are winning [19:42:59] for now ;) [19:43:06] that’s been the case for a while on Kiva, good to see it’s still the case >:) [19:46:31] * halfak celebrates ideological battles that result in more philanthropic behavior. :) [19:49:24] "I'm a kiva lender and I didn't know there were teams" sounds like "I'm a wikipedia editor and I didn't know there were WikiProjects" [19:50:17] reminds me of the how offline translates to online impact [19:50:48] wonder if they looked at the difference between online/offline engagement [19:51:02] EGalvez, add to question list? [19:51:07] sure [19:51:10] hey y'all [19:51:14] o/ Ironholds [19:51:19] so, I didn't sleep well and I can tell from my emails that I'm getting grumpy [19:51:30] so I've set some things to run automatically and am going to power nap until the apps meeting [19:51:31] DarTar, ^ [19:51:32] So for anyone else, if you'd like to ask any questions, let me know. [19:51:32] the selection of the participants is very interesting [19:51:47] Ironholds: sounds good, get some rest [19:52:07] Thanks aaron [20:00:08] I need to leave in a bit, but I have one more question in case it doesn't come up. How well do the teams translate to "impact" ? Is there any kind of support for the team, like how to make lending choices? Is there any research on the retention of lenders? [20:00:14] *a few more questions.. :) [20:00:46] Will ask. The video should have the answer for you. [20:00:49] :) [20:01:10] Thanks! And when I say "impact" i mean impact related to borrowers [20:01:31] Gotcha. Will specify [20:07:53] DarTar: http://www.uaar.it/news/2013/04/24/il-cuore-degli-atei/ ? [20:08:42] * DarTar wishing we could use Echo notifications for an experiment on Wikiproject recommendations [20:08:55] where’s J-Mo? [20:09:37] Good Q. Could be the UW event that Nettrom was talking about. [20:09:46] nah, that's tomorrow [20:09:53] surprised he's not around [20:09:58] Busy J-Mo [20:10:06] * Nettrom_ is in coffee shop, no good WiFi, will watch video later [20:10:07] * DarTar shakes fist [20:11:20] The study Yan is talking about: http://www.tc.umn.edu/~chingren/pdf/papers/Chen%20Ren%20Ridel%20Diversity%20CHI%202010.pdf [20:11:41] ^ TL;DR: Diversity is important for successful WikiProjects [20:12:10] wow I had no idea about this expert bot project [20:12:39] they're working on the bot approval [20:12:39] Expert ideas bot: https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval#ExpertIdeasBot [20:12:49] halfak: do you know if they have a research page? [20:13:01] Hmm... I don't think so. [20:13:05] ha [20:13:46] Oh no. Move mouse plz [20:13:50] on Yan's machine [20:14:14] \o/ it's gone! [20:15:16] Bot is approved for 15 day trial. [20:15:19] FYI [20:16:07] I think the trial period has concluded, they're working on getting community feedback [20:17:21] halfak: can you ask rosta to create a R: page on meta? :) [20:17:39] Yes. I'll use this event as evidence that people want it :) [20:18:03] running out of battery here, bye bye [20:18:19] o/ [20:18:40] when is DarTar not running out of battery? ;) [20:19:10] I’m turning into a meme [20:25:27] Dario is referring to https://www.mediawiki.org/wiki/Echo_(Notifications) [20:25:34] As a means to deliver recommendations. [20:31:12] Time! [20:31:37] I'm going to have to drop out and go to the next meeting. Thanks guys! [20:33:34] thanks! [20:34:10] Thanks, good presentation. [21:16:01] halfak: ping? I've a friend around super interested in ML and similar things, and has a bunch of free time (potentially). Is asking me if there are things he can contribute to? [21:16:25] are you in a meeting? :0 [21:16:51] oooh. I'm in the middle of a meeting to get other ML enthusiasts funded to work on wiki. There's always more work. :) [21:17:04] is there a list? :) [21:17:14] mailing list? [21:17:23] work list? [21:17:26] yeah, work list [21:17:40] hmm.. not exactly [21:17:51] I pointed him to wikiclass and revscoring and he seemed pumped [21:18:08] Great. That's what I'd like to talk to him about. [21:19:00] sweet. I'll get him here once the weekend is over [21:19:06] :) [21:19:08] have fun! [21:42:18] Ironholds: yt? [22:49:00] * halfak finally gets to program for the first time all week. [22:49:02] o/ [22:50:22] * YuviPanda files a 58 hour meeting with halfak [22:50:36] NOOO ... wait. what kinda meeting? [22:51:11] one where I'm here, and you're there, and then you can use it as an excuse to take other meetings off [22:54:26] :) [22:54:33] Sounds like a hackathon. [22:54:36] halfak: :D yeah [22:54:41] best kind of meeting [22:55:15] halfak: yeah. also there's a nice conf. on data/analytics in India run by friends of mine (5th Elephant), interested in coming over / presenting? :) [22:56:48] YuviPanda, I might be. When is it? [22:57:00] halfak: July [22:57:19] What's it called? [22:57:31] halfak: https://fifthelephant.in/2014/ [22:57:34] halfak: was last year's [23:00:25] How do they select presenters? Is there a publication or just a talk? [23:01:33] Also, I like the website :) [23:02:05] halfak: community voted + selection panel [23:02:10] halfak: https://funnel.hasgeek.com/ [23:02:15] halfak: they run a lot of conferences [23:02:21] Gotcha. So it is like Wikimania. [23:02:28] halfak: yeah [23:02:36] halfak: but they're always on the lookout for interesting things [23:02:42] halfak: I think we do a bunch of interesting things :) [23:02:49] halfak: also might be a nice place to get new people interested. [23:03:14] halfak: I got into Wikimedia stuff as a volunteer because Danese Cooper and Erik visited FOSS.in in 2010 and gave a talk about our infrastructure :) [23:03:29] Yeah. That's last one's the angle I think is interesting. [23:04:05] I think a presentation about or data infrastructure could be interesting. [23:04:23] Especially if it ends with, "And here's how you can log in an play around yourself." [23:04:28] yeah :D [23:04:29] exactly [23:04:33] and our sanitarium stuff [23:04:45] sanitarium? [23:05:50] data scrubbing for toollabs [23:05:52] err [23:05:54] for labsdb [23:09:14] The version that's already in place? [23:09:24] (sorry lost myself in API docs for a bit there) [23:09:37] Or is sanitarium something new we're considering? [23:13:14] halfak: yeah, it's already in place [23:13:23] but that might make sense for rootconf [23:14:55] Gotcha. I think I have another venue where it would be cool to show off that (less infrastructure, more what it achieves) [23:15:06] J-Mo and I are organizing a workshop for CSCW. [23:15:28] A key theme is going to be about effective data sharing that open organizations like our own can participate in. [23:15:32] yeah [23:15:36] oooh, nice [23:15:46] Quarry (and sanitarium to a lesser extent) is worth showing off. [23:15:52] I'm still in travell-y phase, so not much time for helping out with researchy stuff. [23:15:55] quarry is stable tho :) [23:16:04] will have time in December, where I'm camped in the himalayas all month [23:16:25] I just confirmed Science guy @ GitHub (Arfon Smith) will be attending. [23:16:32] woah nice [23:16:39] Conf. is in March [23:16:55] * YuviPanda has never been to a 'research' conf [23:17:03] should do something nice enough for one of those things sometime [23:17:23] Let's write about Quarry. [23:17:32] I'm not sure how yet, but that's normal for research projects. [23:17:53] writing about it? [23:18:02] I think if I add templating to it as well, then it'll be much more worthy [23:18:36] Maybe. I think we ought to think carefully about what ought to increase usage and experiment with that. [23:19:03] There should be some foundation from which to draw hypotheses. [23:19:07] sure, and if you look at a lot of what's there now there's a lot of 'reports' [23:19:16] E.g. social dynamical problems in Wikipedia. [23:19:40] Do we have any bots? [23:19:56] * halfak imagines a bot that would run on javascript and Quarry queries [23:20:08] no, but magnus has this thing [23:20:11] let me find link [23:20:30] halfak: https://tools.wmflabs.org/toolscript/index.html?pastebin=Zu9ifKUn [23:20:32] seen that? [23:20:40] halfak: so that's kind of close but not really [23:21:04] This is pretty cool. [23:21:15] yeah [23:21:31] I can add a little bit of code to make the second param to loadQuarryResult unnecessary [23:22:37] * halfak thinks [23:22:50] I don't know if we'll get many users coming to an external site. [23:22:50] but need to find cool usecases [23:22:57] yeaha [23:23:11] but if we add templating, all of the db reports on wikis can be replaced with quarry [23:23:20] OK. I have a dumb idea. [23:23:42] A quarry module for lua [23:23:51] aaaaaah [23:23:55] never gonna get deployed :) [23:23:58] Yeah. [23:24:07] But I bet that people would use the shit out of it. [23:24:16] plus quarry will die :) [23:24:25] labs probably won't survie that [23:24:40] Maybe. Unless some strong rules are put in place. [23:24:46] indeed [23:24:59] and it might even be ok, since it'll pull in pure JSON [23:25:24] We should talk to gwicke about quarry as a service for MediaWiki. [23:25:41] It sounds silly, but I'm not seeing it as that silly the more I think about it. [23:26:00] :D [23:26:05] it's kind of arbitrary SQL execution [23:26:15] It always was :) [23:26:27] What does putting it in MediaWiki matter? [23:26:37] there used to be Special:Ask [23:26:42] that did this :) [23:26:44] and then doesn't [23:26:44] Other than you might have a page accidentally run a query once/second. [23:27:27] yeah but that's ok [23:28:29] waybackwhen, all admin/checkuser actions were done through Special:Query, or somesuch, which was a little window to the db [23:28:31] this is not a joke [23:28:34] ah, memories. [23:28:47] yeah [23:28:50] that was Special:Ask [23:29:10] aha [23:29:37] Quarry is a separate service though. A sanitized one. [23:29:47] One cannot take Wikipedia down by taking Quarry down. [23:29:58] yeah [23:33:34] tnegrin, I had a brilliant idea that may save our bacon. When's the meeting kicking off? [23:34:11] now [23:34:16] it's ok