[03:37:30] <halfak>	 harej, still around?
[03:38:06] <halfak>	 re. analysis of the history of a page, it depends a lot on the size and number of revisions. 
[03:38:30] <halfak>	 It'll also matter what kind of analysis you are performing. 
[10:21:31] <harej>	 halfak: the analysis would be basic parsing that can be done in Python
[10:22:50] <harej>	 I am interested in doing research on the effectiveness of Reports bot's reports to get people to edit articles, meaning I will want to parse the past versions of report pages.
[10:23:40] <harej>	 A separate project: I want to assess historically how many times a given website is linked to on Wikipedia so I can say citations to it have gone up or down.
[10:28:39] <harej>	 The first project involves revision analysis on a small set of pages and edit history graphing on a set of Wikipedia articles smaller than "all of them"
[10:29:12] <harej>	 The latter project is a lot more computational but the older stuff only needs to be measured once and should be slightly less computational thereafter.
[12:27:55] <halfak>	 o/ harej 
[12:28:00] <harej>	 halfak is awake
[12:28:13] <halfak>	 Re. looking at reports bot should be pretty trivial
[12:28:43] <halfak>	 Re external links, that'll be a bigger job, but I bet we can essentially get it to run overnight. 
[12:29:18] <harej>	 can I use your mw utilities for the reports bot work?
[12:30:05] <halfak>	 Yeah.  I'd use mwapi.  It's a relatively small set of pages, right? 
[12:30:27] <harej>	 It's less than 50 pages.
[12:30:42] <harej>	 And then I'd have to scour the edit histories of all the pages all those pages ever linked to.
[12:31:02] <harej>	 I want to see if there is a relationship between appearing on a report and edits to that article during that period
[12:31:05] <halfak>	 Still, I think the API/db will make that easy. 
[12:31:49] <halfak>	 So, I have a script that does something like this that I put on PAWS, but there's no nice notebook viewer up there yet.
[12:33:27] <halfak>	 harej, https://gist.github.com/halfak/26314f5481c9067002e3c24bba39ff1a
[12:33:51] <halfak>	 That script uses the XML dumps that only have the last revision to gather all of the current headers from English Wikipedia. 
[12:34:26] <harej>	 define "headers"
[12:35:30] <halfak>	 == level 2 == 
[12:35:38] <halfak>	 === level 3 ===
[12:35:44] <harej>	 Ah, HTML headers.
[12:36:16] <harej>	 Is that related to what I'm working on, or is this something you worked on in the past that vaguely resembles my current project?
[12:39:58] <harej>	 It looks like from this I would look for links instead of headers, use mwapi instead of mwxml, and get all revisions instead of just the latest?
[12:40:17] <halfak>	 +1
[12:40:34] <halfak>	 (to answer your question, this just vaguely resembles your project)
[12:40:48] <halfak>	 I'd use mwxml to get links over time. 
[12:40:54] <halfak>	 ^ external links
[12:41:02] <harej>	 Well, yeah.
[12:41:06] <harej>	 That's another project.
[12:41:37] <halfak>	 :)  
[12:42:39] <harej>	 It's funny, they call it the mediawiki parser from hell but it's the prevailing Python wikitext parser.
[12:42:58] <harej>	 Never underestimate the lifetime of hacks.
[12:50:45] <harej>	 So, good to know my ideas are feasible given current technology.
[12:50:56] <harej>	 I shall hand it off to Fabian.
[12:51:11] <harej>	 Today's project is to create a Python script that interacts with my Django data model and posts lists on Wikipedia.
[12:56:34] <harej>	 Remind me, which database table records transclusions?
[12:56:46] <Nemo_bis>	 templatelinks
[12:57:11] <harej>	 thank you
[12:57:15] <halfak>	 harej, https://github.com/earwig/mwparserfromhell/issues/123
[17:24:14] <guillom>	 _o/
[18:02:36] <halfak>	 o/
[21:42:53] * halfak --> park w/ dog
[21:43:47] <Emufarmers>	 pix
[22:06:04] <YuviPanda>	 Emufarmers: have you seen halfak's ferret?
[22:11:23] <Emufarmers>	 possibly
[23:15:51] <halfak>	 Emufarmers, https://imgur.com/JCsGHdk
[23:16:09] <halfak>	 I made you a gif of Luna getting water between throws. 
[23:16:36] <YuviPanda>	 halfak: :D maybe I should come over to minnesota in June!
[23:16:51] <halfak>	 That would be great.  But june is a bad month. 
[23:16:57] <halfak>	 I think I have something to do every week. 
[23:17:11] <YuviPanda>	 ah, I see. what would be a better month? July?
[23:17:21] <halfak>	 Yeah.  July is great. 
[23:17:33] <halfak>	 Do you play frisbee golf?
[23:17:58] <YuviPanda>	 ok, I'll keep it in mind :) I am torn between trying to sit here and do stressful-but-required-adult-thing (find housing) vs do the more natural easy thing (just travel around for 4 months)