[10:25:09] <Vermont>	 halfak: You on?
[10:29:53] <Vermont>	 halfak: When you see this, or whoever does, how does one find an ORES score from a diff?
[11:23:29] <akosiaris>	 Vermont: https://ores.wikimedia.org/
[11:23:58] <akosiaris>	 the docs should be quite explanatory and there is ofc https://ores.wikimedia.org/ui/ which might be helpful
[12:12:31] <wikibugs>	 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Complete edit quality campaign for Arabic Wikipedia - https://phabricator.wikimedia.org/T131669#4046310 (10Ghassanmas)
[12:12:43] <wikibugs>	 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Complete edit quality campaign for Arabic Wikipedia - https://phabricator.wikimedia.org/T131669#2174291 (10Ghassanmas)
[12:13:22] <wikibugs>	 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Complete edit quality campaign for Arabic Wikipedia - https://phabricator.wikimedia.org/T131669#2174291 (10Ghassanmas)
[14:02:27] <halfak>	 Thanks akosiaris :)  
[14:02:34] <halfak>	 o/ Vermont 
[14:02:39] <halfak>	 Let me know if you still have questions :) 
[14:39:28] <Vermont>	 halfak: I always have questions
[14:41:07] <awight>	 o/
[14:41:39] <awight>	 Amir1: I know you're off, but feel free to pitch me any pending CR
[14:42:05] <Amir1>	 awight: sure, where is it?
[14:42:28] <awight>	 other way around!  I'm available all day if you need to keep me going on stuff.
[14:42:50] <awight>	 Maybe tomorrow we can talk about the plans for the ORES-JADE connector.
[14:53:09] <halfak>	 o/ awight 
[14:57:42] <awight>	 Yes, sorry for the miscommunication!
[14:58:09] <awight>	 I thought I remembered something about you suggesting that I add anything I wanted to the sync agenda since I would be missing it
[14:58:28] <awight>	 oops /pm
[14:58:31] <halfak>	 Oh yeah!  We did talk about that :) 
[14:58:33] <halfak>	 lol
[14:58:42] <awight>	 No worries, it’s been a looong weekend all around.
[14:59:13] <halfak>	 Ultimately the calendar == truth 
[14:59:18] <awight>	 I’m working on a tweet about how people thinking about joining a gym should start with *all* the housework.
[14:59:26] <awight>	 +1 I’ll cover that next time.
[14:59:31] <halfak>	 Cool :) 
[14:59:34] <awight>	 Already have my Mar 27-29 break marked
[14:59:40] <halfak>	 Nice. 
[14:59:42] * awight gets scared and double-checks
[14:59:55] <awight>	 {{done}}
[14:59:56] <AsimovBot>	 How cool, awight!
[15:00:00] <halfak>	 BTW, you should consider pivoting the JADE position paper into a Wikimania talk 
[15:00:10] <awight>	 saucy!
[15:00:19] <halfak>	 Deadline for proposals is Sunday 
[15:00:29] <awight>	 People just might want to hear about that.
[15:00:30] <awight>	 oh
[15:00:33] <awight>	 hehe okay then
[15:01:03] <awight>	 halfak: Did I also mention, I’m fired up to work on paid editing?
[15:01:37] <halfak>	 Yes.  Biggest hurdle we have to addressing promotional-type language is deploying parsers in production. 
[15:01:47] <halfak>	 There's a task somewhere about dealing with the memory issues involved. 
[15:02:02] <halfak>	 Might be able to reuse some of the insights from codezee's work to solve for that. 
[15:02:15] <halfak>	 Alternatively, we can deploy a parallel service that hosts parsers only. 
[15:02:34] <halfak>	 Preliminary work suggests that we can recognize spammy content. 
[15:03:25] <awight>	 I’ve been wondering about specialized workers...
[15:03:31] <awight>	 It might also make sense for drafttopic
[15:04:17] <awight>	 mmm…. ah and the conclusion of my thoughts about that was, first we need to profile memory usage to see if we have leaks that could be corrected by moving initializations pre-fork.
[15:05:03] <halfak>	 I don't think initializations are an issue.  But python behaves strangely with different types of globals -._o_.-
[15:05:18] <awight>	 For parsers, do you think the memory issue related to reference data, cache, or actual in-flight data?
[15:05:24] <awight>	 *is related
[15:06:03] <halfak>	 I think that it's a shared memory issue. 
[15:06:30] <halfak>	 For some reason, when we set a big memory hog as a "global" explicitly, it is shared. 
[15:06:57] <halfak>	 But if we just import the module (which *is a global*, but implicitly) it duplicates memory usage for each process. 
[15:07:11] <halfak>	 In python 3.5 anyway
[15:07:25] <halfak>	 I should make a super simple demo of this behavior. 
[15:08:29] * halfak checks how difficult that will be. 
[15:10:03] <awight>	 oh that’s a huge bug
[15:10:10] <awight>	 but yeah I’m sure there’s a workaround.
[15:10:34] <awight>	 We’re sure that the import happens before fork?
[15:10:36] <halfak>	 The "global" keyword is working pretty well so far.  I wonder how many other cases of this we might identify. 
[15:10:42] <halfak>	 Yes. 
[15:11:05] <awight>	 Can you show me an example of where the global is declared?
[15:11:07] <halfak>	 E.g. maybe we can dramatically reduce memory usage by loading all models into a global map in ORES before forking. 
[15:11:33] <halfak>	 https://phabricator.wikimedia.org/T189364
[15:11:37] <halfak>	 See sumit's work ^ 
[15:11:43] <awight>	 ty
[15:13:17] <awight>	 I’m not seeing where the “global” was added…
[15:13:22] <awight>	 in __init__?
[15:13:35] <halfak>	 Oh I asked sumit the same thing.  He did it inside revscoring. 
[15:13:40] <halfak>	 In the word2vec thingie
[15:13:52] <halfak>	 Hence why I want a simple demo
[15:13:58] <halfak>	 rather than relying on revscoring itself. 
[15:14:54] <awight>	 halfak: side note, I think you’re up: https://github.com/wiki-ai/editquality/pulls
[15:15:13] <awight>	 Does our team use the idiom, “licking the cookie”?
[15:17:07] * halfak looks
[15:17:55] <icinga-wm>	 PROBLEM - https://grafana.wikimedia.org/dashboard/db/ores grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/ores is alerting: 5xx rate (Change prop) alert.
[15:18:47] <awight>	 Anyway, I still haven’t find where “global” gets added, but I trust that it’s the right solution.  I just want to know how to apply myself.
[15:18:58] <awight>	 i.e. it can wait.
[15:20:21] <awight>	 oh, 5xx’s are coming from prodcution?
[15:20:39] <halfak>	 Looks like it 
[15:21:02] <awight>	 Whatever it was, it dropped down again
[15:21:20] <awight>	 looks correlated to a small bump in overall scores errored
[15:21:45] <awight>	 I wish our “overload” graph could be trusted.  We need to add wsgi in there somehow, for when it’s the limiting factor.
[15:22:23] <awight>	 timeout errors make it look like the MediaWiki API is the root cause
[15:23:31] <awight>	 Hmm, it’s possible that timeout metrics are a blend of different types of timeout.
[15:25:44] * awight wanders away from the panic room
[15:27:01] <awight>	 About the word2vec global, one coding error I can imagine creeping in here and there is failure to declare a module-global as “global” when assigning to it in a function.  That would create weirdly orphaned vars.
[15:28:23] <halfak>	 ^ wat
[15:29:47] <wikibugs>	 10Scoring-platform-team (Current): Investigate word2vec memory issues with multiprocessing - https://phabricator.wikimedia.org/T189364#4039997 (10awight) See also {T182350}, I think you're onto something important and we can learn from this, and use it to fix other places where we fail to use copy-on-write.  @Su...
[15:29:47] <awight>	 lol
[15:29:55] <icinga-wm>	 RECOVERY - https://grafana.wikimedia.org/dashboard/db/ores grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/ores is not alerting.
[15:43:53] <halfak>	 Hmm... Something just occurred to me.  I wonder if our pickling process re-imports the module after forking and that this skips shared memory. 
[15:44:18] <halfak>	 We definitely initialize pre-fork, but then the reference is sitting in the pickled object. 
[15:45:23] <awight>	 Amir1: Are you presenting at today’s Diversity Alliance meeting?
[15:45:33] <awight>	 halfak: holy cow
[15:45:51] <awight>	 re-importing isn’t a thing in python, but that’s certainly an interesting lead to folow.
[15:46:20] <awight>	 ooh and it would be interesting if the unpickling *is* a way to defeat python and re-import modules
[15:46:30] <Amir1>	 awight: yup
[15:46:38] <awight>	 Amir1: \o/ thanks!
[15:47:15] <Amir1>	 awight: I just saw your message, today I have a marathon of meetings :(
[15:48:25] <awight>	 no worries—I determined that I’m not blocking you
[15:49:12] <wikibugs>	 (03CR) 10Awight: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/418877 (https://phabricator.wikimedia.org/T166427) (owner: 10Ladsgroup)
[15:49:55] <codezee>	 awight: i saw your comment on the memory issue task, do you know what that 310MB corresponds to by celery workers?
[15:50:23] <wikibugs>	 10Scoring-platform-team (Current), 10JADE, 10Security-Reviews: Security review for Extension:JADE - https://phabricator.wikimedia.org/T188308#4046984 (10awight) >>! In T188308#4040037, @Bawolff wrote: > Review passed. >  > Some minor comments  Thanks, good points!  >  > * "JudgmentValidator.php" line 172 - s...
[15:50:38] <awight>	 codezee: hi!  No, I have no idea what’s in there...
[15:50:59] <awight>	 It could be perfectly innocent, un-garbage-collected in-flight stuff.
[15:52:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 04-1] "This needs more work" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/418877 (https://phabricator.wikimedia.org/T166427) (owner: 10Ladsgroup)
[15:52:28] <codezee>	 but it seems interesting if there could be a solution to a wide array of problems just through an experiment :D
[15:52:48] <awight>	 codezee: high-five!
[15:53:03] <wikibugs>	 (03PS1) 10Awight: Security review followups [extensions/JADE] - 10https://gerrit.wikimedia.org/r/419211 (https://phabricator.wikimedia.org/T188308)
[15:53:50] <wikibugs>	 10Scoring-platform-team (Current): Investigate word2vec memory issues with multiprocessing - https://phabricator.wikimedia.org/T189364#4047000 (10Sumit) >>! In T189364#4046883, @awight wrote: > @Sumit please link to the code changes you're making that seem to improve memory sharing. Refer to the gist in the firs...
[15:54:21] <halfak>	 awight, I think re-importing is a thing when unpickling. 
[15:55:13] <awight>	 Wow!  Exciting possibility.
[15:55:24] <awight>	 IMO we should unpickle in the pre-fork process either way.
[15:55:42] <awight>	 Just cos that’s repeated work for no good reason, if we’re unpickling in children
[15:59:37] <halfak>	 awight, that's impossible
[15:59:49] <halfak>	 The pickle is *for* sending data to the processes. 
[16:00:01] <awight>	 oh, gotcha
[16:00:19] <halfak>	 Oh wait.  well. sort of.  
[16:00:20] <halfak>	 Hmm. 
[16:00:29] <halfak>	 I'm not sure that is true in ORES. 
[16:00:31] <codezee>	 halfak: so in earlier experiments word2vec was getting pickled?
[16:00:55] <halfak>	 codezee, I think word2vec might be getting referenced in a pickled object
[16:00:56] <awight>	 Just a quick advertisement for T173244
[16:00:56] <stashbot>	 T173244: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244
[16:01:16] <awight>	 nvm.  that’s just for models
[16:01:24] <travis-ci>	 wiki-ai/revscoring#1453 (w2vfix2 - e84950c : Sumit Asthana): The build was broken. https://travis-ci.org/wiki-ai/revscoring/builds/352910060
[16:01:25] <halfak>	 awight, is that a serious proposal?  'cause it sounds like a huge about of work for little gain to me 
[16:02:08] <awight>	 The potential gain is that we can train and test our models using a generic backend, for now.  Yeah pretty small gain!
[16:02:11] <codezee>	 halfak: if its referenced in a pickled object, does that mean it should be replicated?
[16:02:49] <halfak>	 codezee, the unpickling process involved importing dependencies
[16:04:42] * halfak --> meeting
[16:05:40] <codezee>	 it makes sense now, the word2vec is referenced in the features object which is pickled when we parallelized
[16:05:47] <codezee>	 *parallelize
[16:06:11] <awight>	 good find!
[16:06:39] <codezee>	 but now since the vectors reside in the same module as the datasource definition and that too as a global, no import is needed
[16:06:46] <codezee>	 even after de-pickling in children
[16:07:58] <wikibugs>	 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES: Migrate ORES extension threshold config from old to new syntax - https://phabricator.wikimedia.org/T181159#4047060 (10awight)
[16:08:19] <wikibugs>	 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES: Migrate ORES extension threshold config from old to new syntax - https://phabricator.wikimedia.org/T181159#3781494 (10awight) This isn't working.  I'll look for debug info and then revert the current patch.
[16:09:25] <awight>	 codezee: That makes sense, and if correct then my CR suggestion to keep a local reference to the global data should cause the memory bloat to regress, I think.
[16:12:02] <codezee>	 awight: if we assign any *large* thing as a datamember of the object which gets pickled it will bloat 
[16:12:15] <codezee>	 so if we do self.keyed_vectors = keyed_vecs
[16:12:18] <codezee>	 even then it'll fail
[16:12:51] <awight>	 I see.  Awesome that this is solved!
[16:14:31] <codezee>	 \o/
[16:24:32] <wikibugs>	 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10User-Tgr, 10artificial-intelligence: Train/test damaging and goodfaith model for Hungarian Wikipedia - https://phabricator.wikimedia.org/T185903#4047100 (10Tgr) Thanks! The models are in [[https://github.com/wiki-ai/editqualit...
[16:27:08] <wikibugs>	 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10User-Tgr, 10artificial-intelligence: Train/test damaging and goodfaith model for Hungarian Wikipedia - https://phabricator.wikimedia.org/T185903#4047101 (10Tgr) Next step on the checklist is [[https://www.mediawiki.org/wiki/OR...
[16:29:50] <wikibugs>	 10Scoring-platform-team, 10ORES: Beta cluster ORES is emitting statsd errors - https://phabricator.wikimedia.org/T189605#4047110 (10awight)
[16:56:35] <awight>	 Does this look right?
[16:56:36] <awight>	 'likelygood' => [ 'min' => 'maximum filter_rate @ recall >= 0.9', 'max' => 1 ],
[16:59:33] <awight>	 ok yeah, “maximum” makes sense there.
[17:31:20] <codezee>	 halfak: since I've not got edittypes repo yet, you can let me know if parsers code is somewhere around on github which i can have a look at and try to get it working with multiprocesses
[17:32:55] <halfak>	 codezee, check out wikigrammar
[17:33:16] <halfak>	 Essentially, the strategy is to load the models in that library into memory and run them on sentences. 
[17:33:33] <halfak>	 Then we'd take aggregate measures of sentences based on that. 
[17:33:46] <halfak>	 wiki-ai/wikigrammar
[17:33:50] <codezee>	 ok
[17:33:58] <halfak>	 do you have the drafttopic models in a PR?
[17:34:33] <codezee>	 halfak: no wait, nice reminder, I'll add a PR for the models and word2vec features
[17:34:45] <halfak>	 :) 
[17:36:09] <codezee>	 awight: are you aware of/involved in https://phabricator.wikimedia.org/T111416 ? it talks about topic suggestion for dashboard program
[17:38:37] <awight>	 codezee: Apparently I was subscribed a few years ago :-)  Great to see that it's in play again.
[17:39:43] <codezee>	 i was looking if it our topic model could fit in somewhere there
[17:39:51] <codezee>	 *if our...
[17:42:21] <awight>	 I bet it would; even better, if WikiEd knows about your model, they'll come up with ways to use it :)
[17:48:23] <wikibugs>	 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10Patch-For-Review: Migrate ORES extension threshold config from old to new syntax - https://phabricator.wikimedia.org/T181159#4047411 (10awight) Looks good now.
[17:50:28] <codezee>	 halfak: i realized tuning report is not there, so i'm generating one, till then i've submitted word2vec features file - https://github.com/wiki-ai/drafttopic/pull/18
[17:50:45] <codezee>	 i'll submit makefile, tuning-report, model in a single commit
[17:53:48] <halfak>	 Sounds great
[18:00:12] <wikibugs>	 10Scoring-platform-team (Current): Scoring platform team FY18 Q2 - https://phabricator.wikimedia.org/T176324#4047462 (10awight)
[18:00:15] <wikibugs>	 10Scoring-platform-team (Current), 10JADE, 10Patch-For-Review: Deploy JADE prototype in Beta Cluster - https://phabricator.wikimedia.org/T176333#4047460 (10awight) 05stalled>03Open
[18:19:19] <halfak>	 Amir1, I'm looking at https://github.com/wiki-ai/editquality/pull/136 and I think I've finally got to the bottom of WTF was in our old makefile and I still want to light my hair on fire. 
[18:19:21] <halfak>	 lol
[18:19:41] <Amir1>	 :))))
[18:19:46] <halfak>	 What the hell were we doing?
[18:20:05] <halfak>	 Somehow, we were training the old model on a random sample of a random sample, but I can't figure out why!
[18:20:38] <Amir1>	 I think making the Makefile templated helps us to review things :D
[18:21:35] <halfak>	 heh.  I think i might take a pass on making this as simple as possible.  Is that OK with you?
[18:21:51] <halfak>	 I'll start with a brand new sample since it seems like we're not using the current one. 
[18:22:21] <Amir1>	 halfak: sure, today I have too much (wikidata-sdie)
[18:22:23] <Amir1>	 *side
[18:30:07] <halfak>	 OK :) 
[18:31:52] <halfak>	 Demo complete!
[18:31:59] <halfak>	 https://github.com/halfak/demo_shared_memory
[18:32:06] <halfak>	 I get the exact same output for both strategies!
[18:32:07] <halfak>	 WTF
[18:32:11] <halfak>	 awight|lunch, ^ 
[18:32:23] <halfak>	 See the ".dat" files for the memory profile. 
[18:32:35] <halfak>	 The scripts in the base represent the two different entry points. 
[18:32:41] <halfak>	 Maybe I did something wrong. 
[18:55:30] <wikibugs>	 10Scoring-platform-team (Current): Investigate word2vec memory issues with multiprocessing - https://phabricator.wikimedia.org/T189364#4047619 (10Halfak) I made a demo of this problem to try to see if I could reproduce it in isolation.  See https://github.com/halfak/demo_shared_memory  TL;DR: it didn't work.  I...
[19:03:35] <awight>	 hehe
[19:04:33] <awight>	 that's terrible.
[19:20:08] <halfak>	 yes. 
[19:20:54] <halfak>	 awight, any early bird registration deadlines we should be aware of for WebSci? 
[19:21:05] <halfak>	 I just realized I missed one for CHI :( 
[19:21:07] <halfak>	 $200 mistake
[19:21:15] <awight>	 yes... hold on
[19:21:28] <awight>	 May 1st
[19:21:34] <awight>	 https://websci18.webscience.org/index.php/registration/
[19:21:55] <halfak>	 OK cool.  You should hear back from the workshop organizers way before then. 
[19:22:33] <awight>	 +1 they're suggesting that they're be able to respond before March 23rd
[19:22:51] <halfak>	 weow.  that'd be fast
[19:22:57] <halfak>	 meowy weowy
[19:32:58] <wikibugs>	 10Scoring-platform-team (Current): Investigate word2vec memory issues with multiprocessing - https://phabricator.wikimedia.org/T189364#4047706 (10Sumit) >>! In T189364#4047619, @Halfak wrote: > I made a demo of this problem to try to see if I could reproduce it in isolation.  See https://github.com/halfak/demo_s...
[19:33:26] <codezee>	 halfak: ^
[19:33:46] <halfak>	 codezee, I responded to your PR
[19:34:07] <halfak>	 I don't see why we need to have demo_kv be an argument to the function
[19:34:20] <halfak>	 Of course that would be a problem because we'd pickle it for every call!
[19:34:36] <codezee>	 halfak: we don't what i'm trying to convey is that its the same as when we do self.kv=something
[19:36:02] <codezee>	 halfak: in your demo if you use a class and assign demo_kv to self.kv and make process use that you'll see the difference, just that using a partial function makes it much more easy to see
[19:36:47] <halfak>	 codezee, hmmm.  Let's test that then
[19:37:22] <halfak>	 Also, if that is the problem we don't need a global to get around it :) 
[19:40:00] <codezee>	 i've realized we don't need the 'global' keyword it works without it, by just having the keyed-vec outside the class definition
[19:44:40] <codezee>	 halfak: https://github.com/halfak/demo_shared_memory/pull/1
[19:44:44] <codezee>	 updated
[19:45:39] <codezee>	 also proves the point that partial functions and the self.xyz approach is the same
[19:45:47] <codezee>	 *are the same
[19:51:32] <codezee>	 the above code is closest to the extractor script i have been using
[19:52:13] <codezee>	 alright, gotta go, let me know your thoughts on the task or pr
[20:08:59] <awight>	 halfak: I haven't been able to find any task about the COI parser and memory usage, let me know if that's easy for you to dig up.
[20:09:06] <awight>	 I did find https://phabricator.wikimedia.org/T120170
[20:09:31] <halfak>	 Ahh this is about the parsers that sumit was working on. 
[20:09:49] <halfak>	 Right the PCFG for spam articles. 
[20:10:09] <halfak>	 With the idea that POV language is like spam language and can be detected linguistically. 
[20:10:18] <halfak>	 Do you have any other ideas for how we'd gather signal?
[20:13:24] <awight>	 Editors were dropping great feature ideas into our task, and into the related one that lzia started.
[20:13:45] <halfak>	 Oh goodness.  We have two tasks :|
[20:14:11] <awight>	 I think editor maturity vs account maturity is gonna be an interesting source of signal.
[20:14:55] <awight>	 halfak: k, I found the task you mentioned earlier: https://phabricator.wikimedia.org/T157041
[20:15:48] <wikibugs>	 10Scoring-platform-team, 10ORES, 10artificial-intelligence: Address memory usage issues for deploying PCFG-based features - https://phabricator.wikimedia.org/T157041#4047855 (10awight)
[20:15:51] <wikibugs>	 10Scoring-platform-team (Current), 10draftquality-modeling, 10editquality-modeling, 10revscoring, and 3 others: [Epic] Implement PCFG features for editquality and draftquality - https://phabricator.wikimedia.org/T144636#4047854 (10awight)
[20:16:15] <wikibugs>	 10Scoring-platform-team, 10ORES, 10artificial-intelligence: Address memory usage issues for deploying PCFG-based features - https://phabricator.wikimedia.org/T157041#2993446 (10awight)
[20:16:19] <wikibugs>	 10Scoring-platform-team, 10Research Ideas, 10artificial-intelligence: Paid editing (COI) detection model - https://phabricator.wikimedia.org/T120170#4047859 (10awight)
[20:16:56] <wikibugs>	 10Scoring-platform-team (Current), 10Research Ideas, 10artificial-intelligence: [Epic] Paid editing (COI) detection model - https://phabricator.wikimedia.org/T120170#1847096 (10awight)
[20:17:39] <awight>	 This list is human-centric, but we can probably automate a few if we think they'll be useful: https://en.wikipedia.org/wiki/Wikipedia:Identifying_PR
[20:33:08] <halfak>	 +1 awight 
[20:33:18] <halfak>	 We should add that as a Q4 goal if you want to put some time into it. 
[20:33:25] <halfak>	 That's coming up at the end of the month. 
[20:33:25] <awight>	 Sure do!
[20:33:42] <halfak>	 I think we have a dataset for that too.  I've been working with doc james.  I should get that published :) 
[20:34:43] <awight>	 I'm looking at a Java PCFG NLP library just for reference.  I think it's got the same problem, that the model itself is reasonable (20MB) but the intermediate structures for parsing are huge (500MB for normal sentence length).  https://nlp.stanford.edu/software/lex-parser.shtml
[20:36:32] <awight>	 A service sounds nice, or it might be lighter-weight to use specialized celery workers, e.g. http://docs.celeryproject.org/en/latest/userguide/routing.html
[20:40:07] <halfak>	 Just pinged the thread re. the paid editors dataset.  I'm just confirming the author list and some details about cleanup I want to do.  Then I'll get it up on figshare and we can start experimenting. 
[20:40:16] <halfak>	 The dataset is just a list of know paid editors. 
[20:40:39] <halfak>	 So we'll probably want to model them against the average newcomer assuming that most of them aren't paid COI editors. 
[20:41:51] <wikibugs>	 10Scoring-platform-team (Current), 10Wikilabels, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Re. init enwiktionary reverted model & labeling campaign - https://phabricator.wikimedia.org/T188271#4047952 (10Halfak) Ran into {T188564} while working on this.  I'm waiting for it to ge...
[20:42:41] <awight>	 \o/
[20:48:58] * halfak books a bunch of travel for April :( 
[20:49:10] <halfak>	 I hate traveling but I like the things I travel to 
[21:05:15] <awight>	 hehe, I think I'm getting jacked by "New user landing page" now.  https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Wikipedia:New_user_landing_page&page=Jade%3ADiff%2F376901
[21:25:24] <awight>	 o/ Need to leave a bit early, probably won't be back until tomorrow!
[21:25:30] <halfak>	 o/
[21:25:35] <halfak>	 Have a good one awight
[21:25:40] <awight>	 Ext:JADE is... nearly deployed to beta
[21:25:42] * halfak prepares for his management training
[21:25:48] <halfak>	 I'm scoping it out now. 
[21:25:48] <awight>	 bahahaha
[21:25:55] <awight>	 Enjoy
[21:26:11] <halfak>	 "There is no revision history for this page." 
[21:26:15] <halfak>	 wha?
[21:26:20] <halfak>	 https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Jade:Diff/376901&action=history
[21:26:35] <halfak>	 Oh.  Must have just been cleared.