[03:37:30] harej, still around? [03:38:06] re. analysis of the history of a page, it depends a lot on the size and number of revisions. [03:38:30] It'll also matter what kind of analysis you are performing. [10:21:31] halfak: the analysis would be basic parsing that can be done in Python [10:22:50] I am interested in doing research on the effectiveness of Reports bot's reports to get people to edit articles, meaning I will want to parse the past versions of report pages. [10:23:40] A separate project: I want to assess historically how many times a given website is linked to on Wikipedia so I can say citations to it have gone up or down. [10:28:39] The first project involves revision analysis on a small set of pages and edit history graphing on a set of Wikipedia articles smaller than "all of them" [10:29:12] The latter project is a lot more computational but the older stuff only needs to be measured once and should be slightly less computational thereafter. [12:27:55] o/ harej [12:28:00] halfak is awake [12:28:13] Re. looking at reports bot should be pretty trivial [12:28:43] Re external links, that'll be a bigger job, but I bet we can essentially get it to run overnight. [12:29:18] can I use your mw utilities for the reports bot work? [12:30:05] Yeah. I'd use mwapi. It's a relatively small set of pages, right? [12:30:27] It's less than 50 pages. [12:30:42] And then I'd have to scour the edit histories of all the pages all those pages ever linked to. [12:31:02] I want to see if there is a relationship between appearing on a report and edits to that article during that period [12:31:05] Still, I think the API/db will make that easy. [12:31:49] So, I have a script that does something like this that I put on PAWS, but there's no nice notebook viewer up there yet. [12:33:27] harej, https://gist.github.com/halfak/26314f5481c9067002e3c24bba39ff1a [12:33:51] That script uses the XML dumps that only have the last revision to gather all of the current headers from English Wikipedia. [12:34:26] define "headers" [12:35:30] == level 2 == [12:35:38] === level 3 === [12:35:44] Ah, HTML headers. [12:36:16] Is that related to what I'm working on, or is this something you worked on in the past that vaguely resembles my current project? [12:39:58] It looks like from this I would look for links instead of headers, use mwapi instead of mwxml, and get all revisions instead of just the latest? [12:40:17] +1 [12:40:34] (to answer your question, this just vaguely resembles your project) [12:40:48] I'd use mwxml to get links over time. [12:40:54] ^ external links [12:41:02] Well, yeah. [12:41:06] That's another project. [12:41:37] :) [12:42:39] It's funny, they call it the mediawiki parser from hell but it's the prevailing Python wikitext parser. [12:42:58] Never underestimate the lifetime of hacks. [12:50:45] So, good to know my ideas are feasible given current technology. [12:50:56] I shall hand it off to Fabian. [12:51:11] Today's project is to create a Python script that interacts with my Django data model and posts lists on Wikipedia. [12:56:34] Remind me, which database table records transclusions? [12:56:46] templatelinks [12:57:11] thank you [12:57:15] harej, https://github.com/earwig/mwparserfromhell/issues/123 [17:24:14] _o/ [18:02:36] o/ [21:42:53] * halfak --> park w/ dog [21:43:47] pix [22:06:04] Emufarmers: have you seen halfak's ferret? [22:11:23] possibly [23:15:51] Emufarmers, https://imgur.com/JCsGHdk [23:16:09] I made you a gif of Luna getting water between throws. [23:16:36] halfak: :D maybe I should come over to minnesota in June! [23:16:51] That would be great. But june is a bad month. [23:16:57] I think I have something to do every week. [23:17:11] ah, I see. what would be a better month? July? [23:17:21] Yeah. July is great. [23:17:33] Do you play frisbee golf? [23:17:58] ok, I'll keep it in mind :) I am torn between trying to sit here and do stressful-but-required-adult-thing (find housing) vs do the more natural easy thing (just travel around for 4 months)