[00:00:04] so it’s going to be likely, 10 mins per team [00:00:09] leila: subject is… [Wmfall] Quarterly review format changes [00:00:28] Siging off, it’s dinner time on the east coast :-) [00:00:46] halfak: Wayne Gray [00:01:27] leila: I saw that note but I didn’t really pay attention to it, I probably should [00:01:49] Oh yeah! I didn't. :S [00:03:09] perfect. thanks kevinator. [00:03:34] I had totally forgotten about it DarTar [00:03:40] it's actually not a bad thing for us [00:03:49] DarTar, we should rap it [00:03:56] haha Ironholds [00:04:05] "I'm MC DarTar and I'm here to say/R&D handled all the stupid crap you threw at us, K?" [00:04:12] then drop the mic and walk out [00:04:17] done. Time limit problem solved. [00:04:44] yeah, or we can also sing it polyphonically to save time [00:05:31] acapella? [00:05:48] we'd be a datashop quarter [00:05:49] *t [00:06:28] acapella with fans from the cluster as background noise [00:07:14] I like it [00:07:29] have we decided our quarterly goals yet? [00:08:23] for(thing in universe){Science(thing)} [00:09:42] how pythonic [00:10:13] for thin in universe: yield Science(thing) [00:10:35] SyntaxError: 'thin' referenced before assignement [00:10:35] I'm now imagining the scene from LOTR when they forge the rings of power, only the tongs contain a paper. [00:10:45] haha [00:10:58] I know what I want the readership goals to be! [00:11:07] continuing mobile support, refining pageviews...and that's it. [00:11:16] Because, Jesus H, four goals was too much for little ol' me. [00:12:40] I think we might need the whole 10 minutes just to talk about what we got done in this quarter. [00:12:49] Ironholds: no, goals are not even started. Toby and I decided to cancel today’s QR prep meeting because we’re still waiting on input from other teams about what other big priorities are coming up in Q3 [00:13:25] * Ironholds shudders [00:13:26] halfak: agreed, it was a pretty impressive quarter, we cranked out a lot of stuff [00:13:35] I never want another quarter like this ever again [00:13:39] ever. [00:13:50] Ironholds: all your quarters are belong to us [00:13:54] bwa ha ha [00:14:01] the time I spend shaving should not be something that ever has to be spent as buffer room for random asks. [00:14:17] like, shaving should not be something that gets bucketed into "shit I don't have time to do today" [00:14:30] The Process should help us have more reasonable hours. [00:14:45] or… seeing one’s daughters [00:14:52] Once we know how many random requests we can do and still have our evenings, we'll be able to tell others that they'll need to budget. [00:15:16] yup [00:15:17] halfak: yes The Process [00:15:21] (TM) [00:15:30] halfak, I think that's dependent on certain classes of individuals falling within it [00:15:50] also, the fact of the matter is that sometimes requests come up that genuinely SHOULD be top-priority, and be done last thursday, even under The Process [00:15:56] this is why I only want two things next quarter. [00:16:03] because based on this quarter, I will have to do four. [00:16:34] and four is the number of things I can do before, I end up basically turning into the paranoid insomniac version of the dude in A Beautiful Mind [00:16:41] with newspapers on the walls and push-pins and crazy. [00:16:56] Agreed. So we build that in. [00:16:59] yup [00:17:05] It's like being in ops and having the pager. [00:17:07] but that + process, not that || process [00:17:15] yeah, except all of us always have the pager ;p [00:17:20] it's just we have pagers for different clusters [00:17:37] Not if we can quantify pagerness and probability of crazy Oliver conspiracies. [00:18:01] in the last quarter, I signed up to write a pageview definition and a session methodology and support mobile [00:18:13] OK, you get 3 emergencies before the chance of sanity slip falls above OSHA regulations. [00:18:39] in the last quarter, I have written a pageview definition, taught myself the basics of Java so I can help implement it, written a session methodology, taught myself C++ to implement it, published with The Great Halfak [00:19:02] And computermcgyver [00:19:03] supported mobile, done a crash project for the execs, done another crash project for the execs, re-implemented sessions two more times [00:19:12] oh THAT'S who he is! [00:19:17] You didn't know? [00:19:21] lool [00:19:49] handled every legal request for pageview data, handled every squawk in pageviews data [00:20:06] implemented a ua-parser, maintained the ua-parser.. [00:20:10] I am really, really tired. [00:20:17] and you lot are pretty much the same, I suspect. [00:20:20] we need Fewer Things next quarter. [00:20:32] alternately, can we hire the traffic analyst and then not give them anything to do? [00:20:40] We can call them the Buffer Scientist. [00:21:06] That's a potential Ask. [00:21:18] I think we debated having it a couple quarters back, actually [00:21:21] But I don't think it would be a Scientist(TM) that we need. [00:21:25] I balked because it was described as like, "data-dogging scientist" [00:21:47] and it makes me really uncomfortable to take an enthusiastic person and go "I hope you like being knee-deep in scutwork!" [00:22:06] Indeed. We need someone who will be clever about extracting data, but not frustrated by the monotony. [00:22:24] These two things seem to be mutually exclusive. [00:22:28] someone who takes pride out of coming up with innovative ways of munging, and considers the things they learn coming up with that to be its own reward? [00:22:48] so basically we need me, 12 months ago ;p [00:23:04] "I implemented a sampled log reader in awk, R and Python" "dear god, why?" "because NOW WE HAVE ONE THAT WORKS." [00:23:13] hmnmmhmnm. [00:23:31] I think the problem is that data-dogging is creative work. That's why it is so soul crushing to do it without a larger reason. [00:23:48] expand? [00:24:02] I see it as mostly pretty.. "okay, grab values from this table, split on val, run a regex, aggregate by X, number." [00:24:29] At the scale of data (and level of messiness) we work with, you need to be clever to get the things people ask for and know that the number/stat/whatever is sound. [00:24:49] You have to imagine that process though. [00:24:55] * Ironholds nods [00:24:57] Requests is usually "get " [00:25:40] and what that boils down into is "We don't have . build a tool that can read . Retrieve . Write a parser to extract . approximates . [00:25:43] Not "download convert it to and plot by so that people will understand their relationship" [00:26:03] Ironholds, yup [00:26:47] interestingly there is a class of people that would be totally perfect for that [00:27:04] but I've mostly encountered them in CS/engineering communities, not in a research context (I need to meet more researchers maybe) [00:27:39] the "this thing, it is fucked. Make it just un-fucked enough to work" kind of engineer you bring in when a project has blown its timetable and you need someone who can spend 20 hours a day writing some hideous concoction that runs, once, which is all you need. [00:28:06] I wonder if what we'd want is not a sciency person but an engineer who likes data-hackery. [00:28:31] the sort of people who do dataset extraction and reformatting around released datasets for fun. Those people. [00:29:06] We must know some. DarTar must know some ;p [00:30:05] :P Either way, I think process improvements will help everyone. :) [00:30:12] and buffer room. [00:30:13] agreed [00:30:25] okay, declaring it Done on sessions for today [00:30:33] https://meta.wikimedia.org/wiki/Research:Activity_session [00:30:49] * halfak clicks [00:31:08] I still need to document most of the actual science, mind ;p [00:31:16] and then I'll throw it at you and be all "tell me where it sucks!" [00:31:29] and then we can show it to tobias and I can fall over and not move for a while. Like, 3 monthsd. [00:32:19] turns out, halfak hasn’t come out of the closet announcing he’s a stage theorists too [00:32:20] * halfak blocks off time to take a pass on Monday [00:32:24] Ironholds, ^ [00:32:42] * halfak googles "stage theorist" [00:33:17] halfak: no idea what google thinks of that phrase [00:33:46] Stages... hmm... Activity theory conceptualizes "activity phases" as a linear progression of engagements. [00:33:54] halfak: the page Ironholds publicly committed to expand https://en.wikipedia.org/wiki/Perdurantism#Worm_theorists_and_stage_theorists [00:33:56] Between a human, a tool and an object [00:34:37] yay! [00:34:45] wait, halfak is a stage theorist? [00:34:47] * Ironholds fistbumps [00:34:48] Oh. I'm totally a worm theorist. [00:34:51] DAMMIT [00:35:00] woo-hooo [00:35:06] People are four dimensional. [00:35:06] I'll get Leila on my side, even if it's just because she's being polite. [00:35:20] Everything but photons as far as I know are 4 dimensional. [00:35:39] oh, I agree [00:35:45] Ironholds: rally your troops [00:35:55] I just happen to think that people only exist as four-dimensional elements for a single unit of planck time. [00:36:07] Ironholds, how do photons work in stages? [00:37:01] Without the forth dimension, I don't know how Zeno's arrow can hit the target. [00:37:33] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes#Arrow_paradox [00:37:37] ^ Ironholds, DarTar [00:37:38] Zeno's arrow does not exist! Zeno's arrow ceased to exist a planck length after it was fired [00:37:49] but what was Zeno's arrow is still linked to the thing that hit the target. [00:37:54] that thing WAS zeno's arrow. [00:38:31] I would like it noted, now that halfak is here and can validate (because I gave him the simple explanation), that stage theory is my rationale for tattoos [00:38:43] This is a crazy take on https://en.wikipedia.org/wiki/Ship_of_Theseus [00:38:50] totally! [00:39:04] that's how it came up; I wanted to name the PV class PerdurantistPageviews [00:39:11] and the engineers didn't get the joke [00:39:25] At what moment was the ship ever Theseus'? [00:39:43] the moment it was defined as his, and never after that [00:39:55] So, for zero seconds, it was his ship. [00:40:01] well, one planck length [00:40:01] Because an instant has no time. [00:40:13] oh god, not gooey time [00:40:17] Ahh.. Is time digital then? [00:40:31] At least discrete? [00:40:38] I start thinking about the zen "if the present is divisible, it is not the present, but if it is not divisible, how can it contain connectors to the past and future" problem [00:40:41] and then I get a headache [00:41:05] discrete and at a certain point indivisible [00:41:16] one of the primary stage theory criticisms is gunky time - the idea that time is infinitely sub-divisible [00:41:42] "gunky" doesn't sound like the proponents came up with that name. [00:41:54] hey all, my folks are in the office, I’ll leave you at 4d worms for now [00:41:58] have a great weekend [00:42:18] I don't actually know [00:42:24] I've seen Sider use it but I don't know who defined it [00:42:28] DarTar, STAGES. [00:42:29] o/ DarTar [00:42:35] have a good night! [00:43:12] Ironholds, -._.-._.-._.-._.o [00:43:21] [][][][][] [00:43:36] of course, stage theory, as dario noted earlier, totally fucks with session analysis [00:43:36] lol [00:43:42] because it invalidates UUIDs ;p [00:45:08] heh. So how do you address the persistence of so many variables between stages? [00:45:25] 4d suggests inertia [00:45:41] Lots of things can have a series of identical variables! [00:45:52] the distinction is that some variables have changed [00:46:04] But for what reason. How does anything persist between stages? [00:46:21] I guess inertia would be a good way of putting it [00:46:38] but as said, brain, mush ;p [00:46:44] So, the world has memory within a stage that persists to the next stage? [00:47:03] Hokay. [00:47:14] * halfak gets back to hadoopin' [00:48:49] I'm just going to leave this here: http://imgur.com/zrlYZOJ [01:21:50] I will reduce all of the maps. [01:22:08] There can only be one (or like 50) [01:22:14] But it's a constant! [01:22:20] So it might as well be one. [03:09:50] "gunky time" definitely sounds like something halfak would say [08:55:25] 3Quarry: Link to the Quarry documentation - https://phabricator.wikimedia.org/T85051#937639 (10Schnark) 3NEW [10:10:29] 3Quarry: Link to the Quarry documentation - https://phabricator.wikimedia.org/T85051#937677 (10Glaisher) a:3Glaisher [10:20:05] 3Quarry: Link to the Quarry documentation - https://phabricator.wikimedia.org/T85051#937685 (10Glaisher) IMO, it may actually be more appropriate if that page was on wikitech but whatever... [11:05:26] 3Quarry: Link to the Quarry documentation - https://phabricator.wikimedia.org/T85051#937715 (10yuvipanda) 5Open>3Resolved Thank you! [15:21:28] * halfak reads scrollback [15:21:33] :P Emufarmers [15:26:28] hi halfak [15:26:39] Hey YuviPanda! [15:28:01] I've been reading up a bit more on Strom. I'd really like to play around with something that uses RCStream+ as a spout. [15:28:41] halfak: what use cases do you think of? [15:29:07] halfak: also, even better / worse would be to pipe a mysql replica stream into storm :P [15:29:17] Revert detection. Computation of the metrics I want to use in WikiCredit. Integration with Echo for on-wiki events. [15:29:45] YuviPanda, either way. Stream of public mediawiki events. [15:29:48] right [15:38:18] YuviPanda: https://meta.wikimedia.org/wiki/File:MediaWiki_Events_Conceptual.svg [15:39:11] Wooo nice [15:39:22] I guess source actions would be edits and log actions [15:41:17] Yup. The same sort of things I have listed out here: https://meta.wikimedia.org/wiki/Research:MediaWiki_events:_a_generalized_public_event_datasource#Relevant_events [15:41:24] Which are mostly attainable from the recentchanges feed. [15:47:54] YuviPanda, I have a question about a problem I imagine with such systems [15:48:12] Let's say we have the following "topologies" [15:48:25] * YuviPanda listens [15:48:47] [source] --> [bolt1] and [source] --> [bolt2] and [bolt1]+[bolt2] --> [bolt3] [15:49:02] Simply bolt1 and bolt2 come directly from the source [15:49:07] And bolt3 depends on them both. [15:49:30] What if bolt2 can't keep up, but bolt1 can? [15:49:47] Bolt3 waits [15:50:13] After a while, bolt1's output will fill up some sort of buffer? [15:50:39] If the idea that you never want to have a bolt get behind so you don't wind up in this situation? [15:50:47] Depends on how bolt3 is consuming [15:51:17] You will. The thing is to reduce the different latencies to slow enough to not fill all buffers [15:51:37] This sounds like a somewhat classical CSP problem almost [15:52:01] Hmm. [15:52:26] One option is for there to be not much branching [15:52:41] It can consume output of bolt2 as an api or something [15:52:49] Instead of trying to listen to two streams at once [15:52:57] Indeed. Then we'd be exiting Storm. [15:53:06] hello halfak ! :-) [15:53:16] Storm seems to be built to handle multiple stream inputs for bolts. [15:53:19] o/ Helder_ [15:55:09] Helder, I'm just finishing up email and then I'll get to working on the extractor structure card. [15:55:33] ok [16:09:04] * halfak kicks off a massive hadoop job. [16:10:34] I wish there was a 'top' for Hadoop [16:12:05] * halfak merges Helder's requirements pull request [16:12:13] nice :-) [16:12:24] We should take your comment and move it to the docs somehow. [16:12:43] I'm going to copy-paste into readme. [16:13:58] What version of Mint? [16:15:37] 17 [16:16:17] halfak: I added more comprehensive info to the Trello card [16:16:46] https://trello.com/c/hlaBu9b9/33-fix-and-test-requirements-txt [16:17:25] Cool. I'll pull that in. [16:23:19] Helder, https://github.com/halfak/Revision-Scoring/pull/15 [16:30:49] done [16:33:23] halfak: did you get notified by my comments on https://github.com/halfak/Revision-Scoring/commit/cc62cdb43996517987bc5a537a9cd78b83e3ef58 [16:33:23] ? [16:33:41] I forgot to use "@this" to mention your username [16:50:07] halfak: ? [16:50:15] Yeah. What's up? [16:50:21] I lost connection for a little bit. [16:50:26] did you see my previous comment? [16:50:29] I guess not. [16:50:44] halfak: did you get notified by my comments on https://github.com/halfak/Revision-Scoring/commit/cc62cdb43996517987bc5a537a9cd78b83e3ef58 [16:50:45] ? [16:50:45] I forgot to use "@this" to mention your username [16:51:15] @this? [16:51:23] @halfak [16:51:26] or something like that [16:52:25] I don't know if github notifies other users if I don't mention their usernames [16:52:31] Ahh yeah. I think I did see that go by. [16:52:34] * halfak searches email [16:53:10] Yup. I did see 'em. [16:53:34] I'm working to address the concerns in the cleanup of feature extractor. [16:53:45] ok [16:53:54] other thing, about https://github.com/halfak/Mediawiki-Utilities/blob/master/examples/api.recent_changes.py [16:54:17] that "revisions to User:EpochFail" is not correct right ? [16:54:27] there is no filtering in the code [16:54:32] That's right [16:54:43] Was a forgotten copy-paste from another example. [16:54:55] Also, the name is "api.recent_changes.py" but it return old revisions instead of recent ones =/ [16:55:09] Sure. The oldest in revent_changes. [16:55:11] *c [16:55:21] recent_changes goes back 30 days. [16:55:22] direction="newer" --> direction="older" fixes that [16:55:45] I mean, I first got a revision from 2013, and when I changed to "older" I got one from today [16:56:06] Sure. It reads from different ends of the recent_changes queue. [16:56:09] Wait. 2013. [16:56:10] WAT [16:56:18] yah [16:56:22] That's not possible [16:56:35] Can you make a gist so I can look at it? [16:56:50] copy-paste code here: https://gist.github.com/ [16:56:53] I'll double check [16:57:31] ah, and one more thing: I always get 0 chars =/ [16:58:01] Yup. that'd be because we aren't asking for text./ [16:58:06] That example has issues. [16:58:13] Works though :S [16:58:19] Just poorly explained. [16:59:31] brb [17:00:31] kk [17:01:24] * Helder is back [17:03:01] halfak: https://gist.github.com/he7d3r/ce721c9be49d40535368 [17:03:08] the first result is https://pt.wikipedia.org/wiki/?diff=40894574 [17:03:12] from today [17:03:23] and I used 'direction="older"' [17:04:57] halfak: updated the gist with results for direction="newer" [17:05:26] the first result is from august: https://pt.wikipedia.org/wiki/?diff=39753618 [17:05:49] but the third one is from 2013: https://pt.wikipedia.org/wiki/?diff=34693351 [17:06:05] Helder, working on it [17:06:21] * halfak hates the new API docs. [17:06:26] It takes forever to find anything. [17:06:46] FYI: I'm running that code from my Revision-Scoring folder (in case this matters) [17:10:36] Helder, the first event type is "external". That means the revid is probably not what you expect. [17:11:05] Yeah. The revid looks right to me. [17:11:14] ^ for the next event [17:11:32] I don't think this is a bug. This is expected behavior. [17:11:43] If you ask for 'timestamp' in props, you'll see what I mean. [17:13:17] Helder, halfak: I'm working on a javascript tool to analyse the features of a dataset, I still need to fix some bugs in histograms but we can already see some interesting data: http://tools.wmflabs.org/ptwikis/Features [17:14:15] danilo, this is pretty cool! [17:14:24] :) [17:14:56] nice [17:16:54] danilo, what is the Y axis of the plot? [17:17:01] Raw count or proportion? [17:17:20] proportion [17:18:22] looking at proportion_of_badwords_added, it looks like there is more density for "reverted" [17:28:50] in proportion_of_badwords_added for example there is more non-zero values for 'non-reverted' than for 'reverted', and in histogram we see that there is more 'reverted' non-zero values above the 0.1, and the 'non-reverted' non-zero values is more concentrated below the 0.1 [17:29:38] But the zero bars look like they are the same height. [17:30:28] It seems like the non-reverted bar should be substantially taller. [17:31:21] It looks like the graph is normalized to the max value -- not 100% of observations. [17:35:29] yes, becouse the bar graph is two histogrms, one for reverted over one for non-reverted [17:35:58] Indeed. Which means that y axis has different values depending on which histogram you are looking at. [17:36:45] that might be confusing [17:36:47] This makes it density to compare the density at specific values of the feature. [17:37:25] ok, I will normalize by 100% of non-zero observations [17:37:34] :) [17:37:39] * halfak looks forward to it. [17:41:21] halfak: done [17:41:58] Hmmm... It still doesn't look quite right. Are zero values dropped or something? [17:42:11] Or are these raw counts? [17:44:03] the histograms shows only non-zero values, the zero values are only at the pie charts [17:47:12] I gotcha. I guess what I'm looking for is "density" on the y axis. You can get this by taking the raw count and dividing by the total count. It's no worries though. I can play around with the data a bit too. [17:50:59] halfak: is it even possible to get the text of the revisions in that example? [17:51:38] It isn't Helder. [17:51:45] I tried adding "content" to the properties and it just said [17:51:45] Not from recentchanges :( [17:51:45] AssertionError: items {'content'} not in levels {'comment', 'tags', 'redirect', 'userid', 'timestamp', 'flags', 'ids', 'sha1', 'loginfo', 'sizes', 'title', 'user'} [17:52:02] Look at that beautiful error. [17:52:07] It should be a TypeError though. [17:52:24] * halfak feels a small amount of shame [17:52:46] so, the fix for the example would be to remove anything related to the text (number of chars...) [17:54:13] yup [17:54:30] And changing the doc string at the top. [17:54:59] halfak: http://dpaste.com/1RHKBP7 [17:55:11] (except for the typo on {2}) [17:56:23] Isn't there a "limit=10", so it would print the 10 oldest recent changes? [17:56:58] it has limit=100 [17:57:14] but we don't need so much [17:57:19] Gotcha. [17:57:24] Otherwise looks good to me. [17:58:39] maybe I should add the timestamp to reduce the risk of others being confused by the revids? [17:58:42] halfak: ^ [17:58:55] I think that's a great idea. [17:59:04] You might change revids to rcids too [17:59:08] up to you [18:02:21] hmm... I cloned you repo directly insfead of creating a fork and cloning my fork, so I can't push to it... [18:02:47] You can still create a fork and push to that. [18:03:26] "git remote add origin https://github.com/helder/" [18:03:41] good to know [18:03:45] :) [18:03:55] it was faster to just copy and paste from my clone to the github interface [18:03:56] =P [18:06:43] halfak: https://github.com/halfak/Mediawiki-Utilities/pull/18 [18:12:02] Helder, just added a comment. Looks like '100' is still in the docstring. [18:12:18] palmface [18:19:52] halfak: updated https://github.com/halfak/Mediawiki-Utilities/pull/18 [18:20:23] merged! [18:21:10] :-) [18:32:02] halfak: about this [18:32:03] https://github.com/halfak/Mediawiki-Utilities/blob/master/examples/api.revisions.py#L26 [18:32:12] isn't the hash available in the API itself? [18:32:27] (nowadays) [18:32:32] It is. I wanted to do something with the text for the sake of the demo. [18:32:40] We could do a word count or something instead. [18:32:41] ah, ok :-) [18:39:48] Refactoring complete! [18:40:18] Now time for some docs and cleanup. then I'm going to submit a pull request that affects ~100 files [18:54:28] Helder: https://github.com/halfak/Revision-Scoring/pull/16 [18:54:32] 101 files for you :S [18:54:39] o.O [18:54:47] BUT 95 of them are really simple refactorings. [18:57:43] Helder, I have to take off now. I should be on in ~5-6 hours to look over your comments. [18:57:51] o/ [18:57:55] bye! [21:22:16] Helder_, if you are still around, I just updated https://github.com/halfak/Revision-Scoring/pull/16 [21:22:53] hey halfak I'll be here for more 10 minutes or so... [21:23:09] :) I just fixed those typos. [21:23:26] I have an unrelated question [21:23:56] Good catch! I'll need to write up a "How to add a feature" [21:23:57] how to filter only edits in a "session.recent_changes.query(...)" [21:24:10] Type. You want "new" and "edit" [21:24:22] "edit" is all revisions except for Page Creation revisions. [21:25:05] it doesn't seems to be working in my example =/ [21:25:13] so... session.recent_changes.query(type=["new", "edit"]) [21:25:15] Hmm. [21:25:17] would you mind taking a look? [21:25:20] Sure. [21:25:24] Gimme a gist [21:25:38] :) [21:25:45] I probably won't touch this code for a few days, but it might be useful for you too :-) [21:25:51] * Helder_ paste it somewhere [21:27:00] halfak: here you go https://gist.github.com/he7d3r/7f2aebb00e18b4963d07 [21:27:20] it is related to https://trello.com/c/3Lt9xxKu/17-test-classifier-on-sample-data [21:30:02] Indeed. Not working. [21:30:05] * halfak digs deeper. [21:30:15] heh :-) [21:31:33] I'll have this fixed up in a couple minutes. [21:32:20] great! [21:34:58] I found more typos there... =P [21:35:45] doh [21:36:32] and what about that other comment https://github.com/halfak/Revision-Scoring/commit/423c5e3aee654ffd29958645f11639caa27d7477#diff-65cac008cc0ba9ada21a5ac4ff74f9fbR26 [21:36:33] ? [21:36:55] (just checking if the examples still match the terminology) [21:38:37] Ahh yeah. That should be called "features" [21:39:36] evening halfak :) [21:39:41] mediawiki-utilities is fixed "pip install mediawiki-utilities --upgrade" [21:39:44] Hey Ironholds [21:40:41] how goes? [21:41:33] not bad. Just cleaning up some code mess so that I can unblock Helder :) How's your evening. [21:42:26] pretty good! I worked out what my next thing-to-split-out-from-WMUtils should be. [21:42:28] halfak: is this "SyntaxError: invalid syntax" known? http://dpaste.com/3AHXWJZ [21:42:34] A generic logfile reader. read.delim but much, much faster. [21:43:15] Helder_, that's a mistake. I don't know how that file got picked up. [21:43:19] Thanks for letting me know. [21:43:26] sure [21:43:42] you're welcome [21:46:36] Helder_, OK try again [21:47:10] Ironholds, is read.delim at the base of read.table and read.csv? [21:47:49] naw, read.table, then read.delim and read.csv as offshoots [21:51:10] Helder_, I think that https://github.com/halfak/Revision-Scoring/pull/16 should be good now. [21:51:32] Ironholds, gotcha. What use does read.delim serve? [21:51:57] it's read.table but with some fields filled out automatically ;p [21:52:16] I think of "read in a file" as "read.delim" because I always handle TSVs. So, rephrase: read.table, but faster. [21:53:56] also, C++ references. Great idea. [21:55:35] Alright. I think I'm done for the night. [21:55:41] have a good one, guys! [21:57:01] take care! [22:25:09] halfak: almost there... https://github.com/halfak/Revision-Scoring/blob/1e04af2f59ee3875291b23e6b797a7ab9a81e44a/demonstrate_extractor.py#L60-L62 [22:25:35] extractors -> features [23:46:59] It's 7pm and I'm copyediting a kickstarter. [23:47:03] something went terribly wrong in my life.