[15:03:40] _o/ [15:08:40] o/ guillom [15:08:45] Nemo_bis, you around? [15:29:10] halfak: yes [15:29:34] Hey Nemo. [15:29:40] So I was going to have you take a look at... [15:29:45] https://meta.wikimedia.org/wiki/Research:Anonymity_and_Peer_Production [15:29:59] They are planning to do some surveys of editors who work anonymously. :) [15:30:29] I've been talking to these researchers about people who would rather edit anonymously and how our work suggests that they are still good editors. :) [15:30:47] I figured you might have some feedback for them or maybe an endorsement to drop on the talk page. :) [15:38:25] Reading, thanks for the ping. I had already read the page but not so attentively [15:40:03] Cool. Thanks Nemo_bis :) [15:40:51] halfak: Ah right, I had discussed with Andrea Forte. Is this the project they are/were seeking grants for, related to Tor? [15:41:14] I don't think that this one is about Tor, but I know that Rachel is digging into that. [15:42:05] Ok. Probably parallel tracks. [16:02:10] * Ironholds yawns [16:02:12] morning! [16:03:29] mornin! [16:07:05] halfak: meh, I ended up writing too much. :( https://meta.wikimedia.org/wiki/Research_talk:Anonymity_and_Peer_Production [16:07:56] yo ottomata! How goes/ [16:07:59] *? [16:09:03] yo Ironholds, goes alright, how you dooiiiin? [16:09:56] p.good! I've been made an official rOpenScientist which was nice [16:14:31] * guillom discovers templates like https://en.wikipedia.org/wiki/Template:West_of_England_Main_Line . [16:26:23] Ironholds: FYI: https://phabricator.wikimedia.org/T90894#1109772 [16:32:26] guillom: there is an entire fauna of editors who live specifically in areas around those templates [16:33:59] Nemo_bis: I think they would be so much happier with real tools to create, edit and embed diagrams like those, rather than using insane templates. [16:34:16] It's the same thing for genealogy templates. [16:34:29] guillom: also for cladograms [16:34:41] I… don' t know what those are. [16:34:48] * guillom opens Wikipedia. [16:35:27] Oh, interesting. [16:35:28] This sort of stuff: https://it.wikipedia.org/wiki/Dinosauria#Ornithischia [16:35:52] Varies a lot across wikis. I think railways lines templates are more standardised, but I wouldn't swear it. :) [16:37:26] Crosswiki standards? On Wikimedia wikis? HAH! [16:51:56] guillom, ta! [16:52:16] I'll get testing with python :) [16:52:18] halfak, ^ [16:52:56] Ironholds, wut? [16:56:04] halfak, I'm using the revision scorer you wrote to test VisualEditor edits! :) [16:56:21] the "best" outcome is "this is totally analogous to testing for bad edits and so we can just run this instead of hand-coding,s weet [16:56:39] Woo! [16:56:57] which would be pretty cool! [16:56:57] Do you have a dataset of hand-coded edits? [16:57:19] we do! guillom just finished one :) [16:57:24] so I'm gonna compare the results and see what happens [16:58:11] How many observations? [17:00:54] not many! a scientific test it isn't. Like, 50 [17:03:01] Gotcha. [17:03:27] I need about 5k to get a decent classifier, but we can always just use the revert classifier. Are you using the API @ ores.wmflabs.org? [17:06:16] I was just gonna use the python module from my local machine [17:08:02] Ironholds, the python API is in flux. You might get unexpected behaviors. [17:08:23] oop. Noted! :) [17:08:29] I'll be happy to help though. [17:08:39] awesome! [17:08:41] "unstable" as in we're still polishing it. [17:13:27] I' [17:14:14] I'm planning to do ~100 diffs / week, so in a little less than a year you can have a classifier :) [17:15:43] In unrelated news, the third-party service whose badly-documented APi I was grappling with yesterday got back to me with a working Python example, so now I can compare & debug mine \o/ [17:16:22] yay! [17:16:28] Ironholds: Also, official welcome: https://wikimediafoundation.org/w/index.php?diff=101262 [17:17:33] yaaay [17:18:24] YuviPanda, so, the pageviews data? [17:18:36] would an API be more viable if I'd worked out a way to halve the size of the files? [17:18:57] more probably :D [17:19:06] oh good. I worked out a way to halve the size of the files ;) [17:23:53] >>> from mw.api import Session [17:23:53] Traceback (most recent call last): [17:23:53] File "", line 1, in [17:23:53] ImportError: No module named 'mw' [17:23:56] ...but it's installed :( [17:27:15] virtualenv issue? [17:29:15] python is basically voodoo to me [17:29:21] * Ironholds yearns for C++ or R or even Java [17:40:10] Ironholds, working from local machine? [17:40:26] that is, am I, or is it? [17:41:10] Where you want mw. [17:43:41] I'm on my local, where it is blowing up [17:44:01] it's definitely installed and the latest pip-recognised version [17:44:22] OK. Let's say I was starting from scratch. [17:44:32] sudo apt-get install python-virtualenv [17:44:36] cd ~ [17:44:38] mkdir env [17:44:39] cd env [17:44:51] virtualenv -p $(which python3) 3.4 [17:44:59] source 3.4/bin/activate [17:45:09] pip install mediawiki-utilities [17:45:30] python -c "import mw" [17:45:41] ^ That last command just checks if it works [17:47:34] aha [17:47:40] neat! Thanks :) [17:48:40] it works! [17:48:47] downside, I'm now using 3.4 by default. Upside, 3.4 by default :) [17:49:24] :) [18:01:16] augh. google hangout shows all my greys :( [18:01:44] Grays are awesome! [18:01:53] Why you no like gray hair? [18:02:06] I have too much of it [18:02:16] also, do you have your 1:1 with toby right now? because apparently I also have a 1:1 with toby right now [18:02:26] meh, fuckit. I don't have anything to say except "yes, I still work here". [18:02:38] Ironholds: you still have 1:1 with toby? [18:02:47] DarTar, quite ;p [18:02:50] ha [18:02:53] evidently he just forgot to remove it uniformly from the calendars [18:02:54] * Ironholds removes [18:03:36] Oh yeah. [18:03:37] Heh [18:03:45] We need to arm wrestle for the 1:1 [18:03:49] Or have a 1:1:1 [18:03:50] dentist appointment bbl [18:03:57] o/ DarTartar [18:04:09] naw, I don't need it [18:04:17] all I have to say, like I said, is "I haven't left yet. Sorry." [18:07:44] Hi tnegrin. Ready for 1:1? [18:07:57] sigh -- yes -- be right there [18:18:06] TIL: Python dictionaries use single quotes internally even if you define them with double quotes. Which is fine and all, until you urlencode them and you get a bunch of %27s instead of the %22s your API expects. *headdesks* [18:18:33] convert to JSON first! [18:20:10] Well, yes, I tried that earlier, but it didn't work because I hadn't explicitly defined the encoding in the header. Now I know :) [18:38:29] Also, the problem is that requests.post only encodes your data if it's a dictionary. If it's JSON, no luck. So apparently, I need to do dict --> JSON (to get double quotes) --> urlencoded string and pass that to requests.post. Cumbersome but better than it not working. [18:45:10] guillom, the json={...} is not encoding with doublequotes? [18:45:18] Single quotes are not valid JSON [18:46:51] Worst case, you can do requests.post(, data=bytes(json.dumps(d), 'ascii')) [18:47:37] guillom, the json= field should accept all jsonable values. [18:47:45] E.g. numbers, strings, lists, objects, etc. [18:47:53] etc. == null [18:49:11] halfak: I think the problem is that that API is veeeery picky about what it accepts. It doesn't want JSON, it wants urlencoded data (so with braces, spaces and square brackets also encoded). So basically, what they recommended is to take the dictionary, urlencode it, replace('%27','%22'), and pass that as data to the post method. [18:49:36] guillom, just use the form encoding. [18:49:45] data=dictionary [18:50:33] halfak: But that was my problem before: That produces single quotes, not double quotes in the encoded lata. [18:50:34] x-www-form-urlencoded [18:50:54] There are no quotes in x-www-form-urlencoded data [18:51:11] Are there quotes in the values of your dict? [18:51:30] There are %27s in x-www-form-urlencoded data, but the APi wants %22s [18:53:28] guillom, they should only be there if there are ' in your values [18:53:34] Why are there ' in your values? [18:54:00] You shouldn't be writing dicts direct to str. [18:54:19] If you want to send a dict to the API, you'll need to json.dumps() before asking it to encode. [18:54:34] For example. Let's say I want to send a dictionary as a value to a param, I'd do. [18:54:36] {"inputs": {"keywords":[{"name": "test6", "operation": 0}] }} <-- this is the kind of data I need to post [18:55:19] d['inputs'] = json.dumps({"keywords":[{"name": "test6", "operation": 0}]}) [18:55:29] requests.post(, data=d) [18:56:14] x-www-form-urlencoded only handles key/value pairs, not hierarchical dicts. [18:58:18] Well, in this case, they want urlencoded data, and they want hierarchical content, which is probably why I need to manually encode it so that hierarchy looks like the value ofa single key. And I imagine they re-build the content on the other end. [19:00:05] * guillom moves upstairs. biab. [19:03:34] (and if that sounds insane, it might explain why I lost a day trying to figure this out before I asked their support team) [19:08:41] guillom, very insane, yes. :) [19:09:33] So, then I would do: requests.post(, data={k:json.dumps(v) for k in mydict}) [19:09:41] woops. [19:09:43] That's wrong. [19:09:46] One more try [19:10:09] requests.post(, data={k:json.dumps(v) for k,v in mydict.items()}) [19:10:55] halfak: Thanks! Will try. [19:12:23] Otherwise I can always encode earlier. But your solution looks cleaner and would use the native encoding of the requests module. (And I like 'clean') [19:15:15] Weeee, it worked \o/ Thank you! [19:21:27] Woot!~ [19:28:05] * Ironholds cries [19:28:13] halfak, how do I download urllib for python3? pip says it doesn't exist [19:28:29] It's part of the standard library [19:28:39] https://docs.python.org/3/library/urllib.html [19:28:41] ...oh. huh. [19:28:52] yeah, that'd do it. Ta ;p [19:29:05] * halfak is python master today [19:29:54] and now it claims request is now a submodule. This is a lie! [19:29:57] *not [19:30:09] huh? [19:30:16] "requests" is a 3rd party lib [19:30:24] urllib.request is something else [19:31:05] * Ironholds throws hands up [19:31:11] heh [19:31:13] this language better have awesome conferences with a ton of free stuff [19:31:20] beats me. [19:31:25] or I'm going to go through every R package you use and make the parameters inconsistent [19:31:25] Also, R [19:31:39] They already are inconsistent! [19:31:46] you can try scale.x_Continuous() on for size [19:32:05] between packages! ...mostly. Except base R. Which is just stupidly inconsistent [19:32:21] everything in the hadleyverse is consistent! [19:32:29] (related: who is officially an rOpenScientist? /I am/ [19:32:45] What does "official" mean? [19:33:54] "they've listed me on the site and we're in talks to get driver moved under the banner so it's an rOpenSci-maintained package" [19:34:16] Which site? [19:35:53] http://ropensci.org/ [19:37:58] * halfak scans website [19:38:34] augh, and they changed how urllib.retrieve.urlretrieve works? or gunzip is being dumb? augh. [19:38:42] I was just getting to grips with 2.7. I almost had it. [19:39:31] 3.x is better [19:39:38] Fixed a lot of crap structural problem [19:39:48] fair! [19:39:51] I don't know who wrong urllib, but I want to stab them [19:40:01] * Ironholds needs to dig into what urllib.retrieve.urlretrieve is doing with temp files now [19:40:01] Also, why are you using that hot mess? [19:42:11] should I be using requests, or whateveritscalled? [19:42:32] halfak: use requests [19:42:35] Ironholds: use requests [19:42:37] userequests [19:42:39] userequests [19:42:41] YuviPanda, build my system for me [19:42:54] you know that file format I was complaining to you about? It parses that :D [19:42:59] aaaah [19:43:09] I’m dealing with *mediawiki* atm [19:43:13] you can’t possibly have it as bad [19:43:16] +1 for requests [19:43:26] All us proper python people found that lib and never looked back. [19:43:33] fine, fine [19:43:39] :D [19:43:43] is there a requests2 and requests3 for no apparent reason too? ;) [19:43:53] Nope [19:43:54] :) [19:43:55] ! [19:44:08] "Python�s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken." [19:44:16] holy crap, requests is python's version of urltools/httr [19:46:07] "stdlib has this, but it sucks. Everyone go use this one" [19:46:22] I sort of imagined the python core developers were less hidebound and silly than the R ones. [19:46:25] nope [19:48:33] It's really only the urllib space. [19:48:40] Everything else is pretty good. [19:49:29] fair! [20:38:56] I confirm everything is easier with requests. ;) [21:00:48] halfak: running a few minutes late with Grace and Kevin [21:00:56] No worries. [21:13:38] halfak: I was just noddling around elsewehre [21:13:40] and saw [21:13:43] Diffs are pretty seriously fast now. [21:13:47] just saying :) [21:13:51] (re: Restbase)