[00:00:13] (03PS1) 10Awight: Downgrade sklearn to match existing models [research/ores/wheels] - 10https://gerrit.wikimedia.org/r/451548 [00:26:12] (03PS3) 10Awight: New fawiki wp10 model, revscoring 2.2.6, other submodule updates [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/451539 (https://phabricator.wikimedia.org/T201518) [00:54:37] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: ORES deployment (Early August) - https://phabricator.wikimedia.org/T201518 (10awight) This works locally, I'll deploy to the beta cluster for a bit. [01:18:59] (03PS1) 10Awight: New fawiki wp10 model, revscoring 2.2.6, other submodule updates [services/ores/deploy] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451555 (https://phabricator.wikimedia.org/T201518) [01:20:05] (03CR) 10Awight: [V: 032 C: 032] New fawiki wp10 model, revscoring 2.2.6, other submodule updates [services/ores/deploy] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451555 (https://phabricator.wikimedia.org/T201518) (owner: 10Awight) [01:28:51] (03PS1) 10Awight: Downgrade sklearn to match existing models [research/ores/wheels] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451556 [01:30:24] (03CR) 10Awight: [C: 032] Downgrade sklearn to match existing models [research/ores/wheels] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451556 (owner: 10Awight) [01:30:34] (03CR) 10Awight: [V: 032 C: 032] Downgrade sklearn to match existing models [research/ores/wheels] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451556 (owner: 10Awight) [01:41:12] (03PS1) 10Awight: [DNM] Switch to beta_aug_2018 wheels [services/ores/deploy] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451562 [01:41:28] (03CR) 10Awight: [V: 032 C: 032] [DNM] Switch to beta_aug_2018 wheels [services/ores/deploy] (beta_aug_2018) - 10https://gerrit.wikimedia.org/r/451562 (owner: 10Awight) [01:48:54] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: ORES deployment (Early August) - https://phabricator.wikimedia.org/T201518 (10awight) Confirmed working on beta! https://ores-beta.wmflabs.org/v3/scores/fawiki/12345 [04:38:05] 10Scoring-platform-team, 10Core-Platform-Team, 10MediaWiki-Special-pages, 10Wikimedia-log-errors: SpecialRecentChangesLinked::doMainQuery blocking database infrastructure - https://phabricator.wikimedia.org/T134976 (10tstarling) The initial report showed a query which didn't even use ORES, so it seems unfa... [13:12:34] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Milimetric) I have a proposal that, whether practical or not, may help us answer @awight's question. Whe... [14:03:57] o/ [14:50:47] \o/ I like milimetric's proposal. Good framing for thinking about limits and constraints. [14:54:30] (03PS1) 10Thiemo Kreuz (WMDE): More specific array type hints accross the codebase [extensions/ORES] - 10https://gerrit.wikimedia.org/r/451642 [14:55:29] (03PS1) 10Thiemo Kreuz (WMDE): Remove unused $originalRequest parameter from @dataProvider [extensions/ORES] - 10https://gerrit.wikimedia.org/r/451644 [15:02:27] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) [15:03:30] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Could/should we just add another endpoint? /v3/scores-normalized? or even a parameter e.g. /... [15:07:18] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10mobrovac) >>! In T197000#4489652, @Pchelolo wrote: > After a quick h-o with @Ottomata and @JAllemandou w... [15:17:59] halfak: heads-up, the new ORES code is on beta. [15:18:31] My morning is chaotic for the next 15min, but then I’m working for real. [15:18:49] Deployments are always more exciting after a little break :-) [15:19:04] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) Yes. As you might imagine, we strive for consistency both to keep our engineering simple and to... [15:19:23] awight, saw that. Glad to see it up and online. :) [15:19:36] awight, BTW, milimetric made a great note in the RFC [15:19:55] We're finally getting traction with "What are the actual problems and what are reasonable constraints!?" [15:22:45] praise Allah [15:24:51] Yes, great to see other people phrasing the question… [15:26:08] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) Also, I feel like it is important to note that ORES is not a MediaWiki-specific technology. We'... [15:28:05] ^ this is agitating me [15:29:11] I’ve been pretending that’s not happening either, but will take a look today if it won’t trigger me :p [15:40:44] k I’m for real now. [15:45:09] halfak: okay, now I’m agitated too. They need to be quoting all field names and then there’s no problem. [15:46:53] I just don't see any good reason why you would implement a specific reformatting in the general source of data rather than in your specific consumer! [15:47:03] > After all, y'all are working with a stream processing system. [15:48:35] On the upshot, it looks like the other parties are flexible and could easily write the stream transformation if needed. [16:06:38] Right, seems that Pchelolo appreciated the stability of our versioning once it was clear. [16:06:47] This is the exact reason we *do* verisoning [16:06:57] So people can do what they want with confidence downstream :) [16:08:15] lol we don’t want to raise eyebrows, now [16:08:44] Maybe we just write the custom endpoint and keep things quiet :p [16:08:45] j/k [16:18:47] This situation with the DBAs is so frustrating. I made progress with mark/the DBAs but in the most annoying way. [16:18:53] They will not give us a ceiling. [16:19:11] But they have said that if we can use our estimate as a ceiling, they will accept that. [16:19:26] um, we haven’t formulated it in a way that would serve as a ceiling [16:19:34] But our estimate was the mean-expected-value. Maybe multiply that by 5 and we'd have a reasonable ceiling. [16:19:35] Right. [16:19:45] So I'm pushing back on that. [16:20:02] I don’t understand what could be meant by “they will accept that" [16:20:24] Like, if we define a ceiling then our troubles melt away? [16:20:45] I’m on board…. but that sounds unlikely. [16:22:12] I realized yesterday that the math might have been misunderstood. When I say the mean of all existing workflows will be an additional 1% of growth, that mean that at 5% overall wiki growth, we’re adding 0.05% more overall growth. [16:23:09] of course “overall 1%” more growth would be asking too much, as that would be 20% of the non-JADE growth. [16:23:30] hehe second “mean” is *means [17:07:54] halfak: darn! Would you mind kicking the CR? [17:07:55] https://gerrit.wikimedia.org/r/#/c/research/ores/wheels/+/451548/ [17:08:08] https://gerrit.wikimedia.org/r/#/c/mediawiki/services/ores/deploy/+/451539/ [17:08:12] [17:09:39] (03CR) 10Halfak: [C: 032] Downgrade sklearn to match existing models [research/ores/wheels] - 10https://gerrit.wikimedia.org/r/451548 (owner: 10Awight) [17:09:43] ty [17:09:51] Yeah, this is what’s on beta [17:10:03] I had to branch it so they’re not exactly the same commits, but the content is identical. [17:15:21] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10mobrovac) >>! In T197000#4491764, @Halfak wrote: > From our point of view, you're asking for us to imple... [17:15:33] 10Scoring-platform-team, 10ORES, 10Performance: Survey ORES performance since having a dedicated cluster and estimate our ceiling and padding. - https://phabricator.wikimedia.org/T201631 (10awight) [17:15:41] (03CR) 10Halfak: [V: 032 C: 032] New fawiki wp10 model, revscoring 2.2.6, other submodule updates [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/451539 (https://phabricator.wikimedia.org/T201518) (owner: 10Awight) [17:15:48] great! deployin. [17:15:55] Typos or not [17:16:10] ORES paper got rejected. :( [17:16:16] NUTS [17:16:27] Draft topic, Bot detection, and WikiProject recommendation papers accepted. [17:16:56] I think the reviewers we got for the ORES paper didn't do a very good job. They didn't even respond to our counter-arguments against some of their thoughts re. the first round reviews. [17:16:57] There must be some really great stuff out there, I’m glad to hear some of it came out of our shop. [17:17:51] So my plan is to unofficially publish and announce the ORES paper ASAP and start working towards a submission to a journal. [17:18:25] cool! Hopefully I can find a way to plug in beyond copying my name from the software maintainers [17:18:55] Maybe I can fill in our future directions related to JADE [17:23:04] @halfak Did you get feedback from the reviewers? [17:23:04] 04Error: Command “halfak” not recognized. Please review and correct what you’ve written. [17:23:34] Yeah. The feedback was basically: See our first review. [17:23:51] We re-wrote a huge amount of the paper! [17:24:06] Hrm. [17:25:52] Sorry to hear it was rejected. I think it's a good inclination absorb the feedback and keep moving forward with it. [17:28:50] Right. So there are a few things we can learn from this. [17:29:18] Aww that’s crappy non-feedback, too bad. [17:29:24] (1) Bringing together 3 different conversations in one paper may be Desirable(TM), but people are lazy and would rather have it broken up. [17:29:29] Reviewers should be… less overworked. [17:29:51] +1 (1), I am lazy. [17:29:55] (2) Reviewers do not want to re-read a paper they have already reviewed and will not be bothered to respond to rebuttals, so it's good to start fresh. [17:30:18] (3) There are some redundancies that remain and can be cleaned up. [17:30:51] (4) Good(TM) systems papers are difficult to get past review for lots of reasons. [17:31:57] Here, I define a Good(TM) systems paper as one that includes a empirically supported rationale, a theoretical grounding, a systems description that corresponds to expected use, a review of use (case studies), and a discussion that suggests directions for future work. [17:32:53] Each of our reviewers wanted us to drop a different part of it. (1) Drop the rationale, (2) Drop the systems description, (3) Drop the case studies [17:33:20] The AC (meta reviewer) should have noticed that the 3 reviewers disagreed on what should be cut and asked them to work it out amongst themselves. [17:33:41] I wish I could have a 1:1 conversation with this AC [17:38:01] halfak: I had a weird thought about the ORES reference UIs. The new, JS-only implementation could be used as a configurable, optional plugin to mw-ext-ORES. [17:38:01] I've been talking to staeiou about what to do next. I think we might try cutting the rationale and motivation (Genre Ecologies, Stagnation of ecosystem, Newcomer issues in Wikipedia) and submit a much shorter paper to FATML. [17:38:14] (I’ll wait to go into it more) [17:38:42] awight, that's interesting. We can talk about that now. I'm done with my rant. :) [17:38:59] hehe, not trying to shut you up or anything :p [17:39:06] * awight leads grandpa away from the hard stuff [17:42:17] My thoughts were that it seems to hit a few nice marks: * Defaults to “your” configured ORES, so it’s intuitive to use in a Special page on whatever wiki. * Fancy and correspondingly optional. [17:42:53] On the down side, we probably want to minify in the npm environment, I’m not sure if that’s compatible with how ResourceLoader works e.g. debug=1 [17:43:03] Seems that the same infra for supporting our reference UI would make it nice to develop gadgets using ORES. [17:43:16] It relies heavily on JS transpilation [17:43:19] No strong opinion about npm either way. [17:43:28] aha ^ that’s a great point about reuse [17:43:37] * halfak googles " JS transpilation" [17:43:55] It would be slightly unprecedently fancy, which could count against this idea [17:44:03] “Bring back the dark ages!” [17:44:57] Yeah transpilation is pretty much the same thing as CC preprocessing, it lets us conform to ancient standards and weird hardware. [17:48:19] 10Scoring-platform-team (Current), 10Wikilabels: Extend wikilabels to support session-labelling - https://phabricator.wikimedia.org/T201370 (10notconfusing) Created pull request at: https://github.com/wiki-ai/wikilabels/pull/242 Needing review from either @Halfak or @Ladsgroup (or whomever they delegate to). [17:48:21] That aside, it does seem that gadgets and the reference UI should use the same ORES client code [17:49:18] That’s so tiny… but maybe. My hesitation is that our abstraction is already at the REST layer. [17:49:39] I guess we might as well make any barriers to entry as small as possible. [17:55:28] Deployment was successful 8D [17:58:09] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: ORES deployment (Early August) - https://phabricator.wikimedia.org/T201518 (10awight) 05Open>03Resolved [18:13:34] 10Scoring-platform-team, 10Analytics, 10revscoring, 10artificial-intelligence: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244 (10awight) [18:13:36] 10Scoring-platform-team (Current), 10ORES: Explore alternative model serializations - https://phabricator.wikimedia.org/T201047 (10awight) [18:18:23] 10Scoring-platform-team, 10Analytics, 10revscoring, 10artificial-intelligence: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244 (10awight) The toolchain isn't very mature. Writing PMML relies on a Java binary, which is acceptable for a compilation pipel... [18:18:34] 10Scoring-platform-team (Current), 10Analytics, 10revscoring, 10artificial-intelligence: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244 (10awight) [18:29:12] halfak: Hey, did I mention that saurabhbatra kicked all the butts with his first round of fraud detection modeling? [18:29:20] 85% precision at 85% recall, mas o menos [18:30:18] Ooooh Cool! [18:30:54] Yeah it’s immediately useful, just needs some integration work. [18:30:58] awight, what do you mean when you say "Our abstraction is at the REST layer"? [18:31:34] To say that we should be steering integrators to the REST endpoint rather than to libraries. [18:32:11] What they need is basically a 1-3 line copy and paste for their choice of development language, and it would be silly to wrap in libraries unless we’re providing a lot of additional value. [18:33:15] Which we can, but maybe beyond some minimum complexity level below which it still makes sense to paste a XHR block. [18:34:06] The base value I see is around QPS. [18:34:25] var uri = 'https://ores.wikimedia.org/v3/scores/enwiki?models=damaging%7Cgoodfaith&revids=' + id; [18:34:25] $.ajax( uri ).done(function( dat ) { [18:34:35] * awight searchs for QPS [18:34:41] E.g. I have a large list of revision IDs and I'd like to get them scored as fast as possible without breaking things. [18:34:44] Queries per second. [18:35:14] ah so unpacking a batch response? I’d +1 that as a good complexity for leaning on a library [18:35:21] just to avoid thinking too hard, if nothing else. [18:35:37] Right. It gets weird in Javascript to manage a queue of requests and parallel requesters. [18:35:53] E.g. I want to score edits in 5 edit batches using 2 parallel connections as fast as possible. [18:35:55] ah even bigger than one batch. Sold. [18:36:18] I do something evil right now in the ORES article quality gadget. [18:36:20] Although we should be in turn relying on an intermediate batching library [18:36:27] I just send 500 individual requests out at the same time [18:36:27] /o\ [18:36:38] You maniac [18:36:41] lol [18:37:07] 500 queries per second for 1 second ;) [18:37:29] BTW, I have that JS in pretty good shape. I can go add it to fawiki now. [18:37:49] It'll be cool to allow people to see and play with those predictions. [18:37:58] I feel like this is a good showcase of ORES article quality predictions :) [18:38:36] doit! [18:38:40] submitted my Wikilabels PR that adds session labelling [18:39:22] i mentioned people in phabricator to review, and now am doing the bugging in here that i was told to do [18:41:21] \o/ Cool. I despirately need lunch. You're first on my list when I get back notconfusing. [18:41:30] no, rush! [18:42:00] i have other things to do today [18:42:54] 10Scoring-platform-team (Current), 10Analytics, 10revscoring, 10artificial-intelligence: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244 (10awight) a:03awight [19:07:06] 10Scoring-platform-team (Current), 10ORES: Explore alternative model serializations - https://phabricator.wikimedia.org/T201047 (10awight) My recommendation is that we go ahead with compressed joblib serialization just to get the disk savings and transparent choice of compression algorithm. [19:13:21] halfak|Lunch: http://clipper.ai/about/ [19:13:33] I should knock on their door. [19:13:50] “Clipper makes the infra-team less unhappy." [19:13:53] hehehe [19:16:00] It accepts raw inputs. They would benefit from revscoring’s dependency solver... [19:19:30] systems paper, https://www.usenix.org/system/files/conference/nsdi17/nsdi17-crankshaw.pdf [19:22:15] It’s got cool stuff like a user-specifiable latency which is balanced against batching requests from multiple consumers [19:24:04] Related Work -> darn, they aren’t aware of ORES [19:42:34] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10EventBus, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) Let's get back on track. 1. We've discussed the solution of the reformatter in ORES with @Hal... [19:44:13] Looks like they are trying to solve the *model deployment* problem [19:45:49] It would be interesting to talk to them about their infra. [19:51:46] Yeah they tackled slightly different problems, like abstracting the inference engine [19:52:01] dynamic batching is kinda cool too, but that’s not our bottleneck. [19:52:19] It’s weird that they assume consumers can calculate the features [19:53:06] 10Scoring-platform-team (Current), 10ORES: Explore alternative model serializations - https://phabricator.wikimedia.org/T201047 (10awight) Just found a great resource for further reading, https://www.andrey-melentyev.com/model-interoperability.html Here's a very similar project, offering models as REST micros... [19:53:19] 10Scoring-platform-team (Current), 10ORES: ORES feature extraction triggers new MCR-related deprecation warning - https://phabricator.wikimedia.org/T201332 (10awight) a:03awight [19:58:54] awight, they what? Ahhh. [19:59:02] That seems. Kinda bad. [19:59:10] Easy to get features wrong. [19:59:18] And get unpredictable predictions! [19:59:40] I suppose it's easier than relying on public APIs if you don't know if you can rely on that. [19:59:52] Yeah I can see it *almost* working for self-contained stuff, e.g. fraud detection, the payload is the only data source, but I’d still expect to see some features calculated among the inputs. [20:00:02] Good point. [20:07:51] halfak: o/ [20:09:26] o/ codezee [20:09:29] congrats dude! [20:10:03] Hey folks! This guy got a paper accepted to CSCW on his first try ^ [20:12:59] And on an important topic :) [20:13:10] (back in 10min) [20:19:14] notconfusing, review complete. I like your implementation. I have a note that I think will be easy to address. [20:37:23] o/ kaylea [20:37:38] \o hello wikimedia ai team [20:37:41] Kaylea is hoping to run a big batch job against ORES. [20:37:44] awight, ^ [20:37:58] Do your worst :-) [20:38:10] Wait no. [20:38:13] Not the worst :D [20:38:20] lol, my worst is much worse than this [20:38:23] kaylea: You can listen for falling rocks here, https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&orgId=1 [20:38:25] this is about 15k revids [20:38:50] pretty pictures, ty [20:39:13] kaylea: Probably fine to run at a parallelism of 4, I donno if you already chatted with halfak about this. [20:40:29] Whatever the default on the score utility is :) [20:40:40] I think it's 2 parallel connections * 50 rev batches. [20:40:45] But it might be 4 connections. [20:40:49] 2 is always safe. [20:40:58] 4 can be safe under most circumstances [20:42:10] https://github.com/wiki-ai/ores/blob/master/ores/api.py#L40 [20:42:26] 4x threads requesting 50 revids [20:46:37] awight, https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&orgId=1&panelId=15&fullscreen&from=now-72h&to=now-1m [20:46:54] The spike? [20:47:03] Something weird happened yesterday at 1748. [20:47:12] Yeah. And then our median periodically drops. [20:47:14] Weird. [20:48:32] That is fishy, I wonder if there’s some timing thing where the FetchScoreJob request consistently beats the ChangeProp job under some circumstances. [20:49:07] Oh... interesting. [20:49:13] If changeprop got slow. [20:49:17] Hmmm [20:49:22] yah that’s what I’m thinking [20:49:29] Would make sense. [20:49:30] let’s see how it lines up with changeprop internally [20:50:45] There’s an event around 1748, https://grafana.wikimedia.org/dashboard/db/eventbus?refresh=1m&orgId=1&from=now-72h&to=now [20:55:24] Where do you see it? [20:56:15] Oh wait. Yeah. I'm seeing it now. [20:56:32] Probably just a hiccup on their part. [20:58:22] awight: [offtopic] you finally couldn't resist to wikitech-l [20:58:43] * awight scratches idly [21:00:49] In an insane world, it seemed like the sanest choice. [21:01:03] it looks like my run isn't causing graphs to spike, so maybe things are good now [21:01:23] while I have the attention of your august selves, can I ask if the reverted model is deprecated? [21:01:29] I think it used to work, but now I get this: [21:01:31] RuntimeError: {'code': 'not found', 'message': "Models ('reverted',) not available for enwiki"} [21:02:04] kaylea: yes we remove the reverted model whenever the better {damaging, goodfaith} models are deployed. [21:02:18] Here’s a list of available models for enwiki, https://ores.wikimedia.org/v3/scores/enwiki [21:02:48] ok, thanks [21:03:13] I'm running various comparisons between a sample of all edits, all IP edits, and all edits made from known Tor nodes [21:03:30] so trying to get as many angles as are readily available [21:04:01] kaylea: Here’s a matrix in case you’re considering wikis beyond English, https://tools.wmflabs.org/ores-support-checklist/ [21:05:43] Sounds like useful research, nice to hear! [21:05:50] we haven't looked much beyond enwiki but it's a good idea [21:06:38] halfak: great. i'll take a look at it now [21:06:40] It might be easy to generalize, at least—unless there’s custom NLP [21:07:56] I am following up on a line of research pioneered by Nettrom actually, looking at who edits articles that are widely read but relatively poor quality [21:08:23] having ores type data lets me consider quality over time on a finer grain [21:08:31] \o/ [21:08:32] 8D [21:08:46] \o/ [21:08:49] We’re useful! [21:08:56] unsurprisingly, I’m a big fan of what kaylea is doing ;) [21:09:01] most definitely useful [21:09:07] kaylea just made our day [21:09:13] as long as I don't useful your service to oblivion :) [21:10:03] Na. I think with that utility working as intended, you should be able to get a lot of scoring done without us noticing :D [21:10:14] Are you running your job right now? [21:10:28] * awight wraps the hamsters running our mainframe in tin foil, just in case [21:10:56] I just finished it up [21:11:42] I didn't see any big spikes in the graphs. 2 in parallel. [21:11:44] Not even a peep! Looks like you're set to go. [21:13:00] thanks all [21:15:23] halfak: I’m thinking that feature extraction and dependency solving is a single package, and can be upstreamed as a sklearn pipeline step that takes random input crap like {rev_id} and turns it into the raw feature vector. [21:15:36] At least, raw feature vector seems like the common denominator that we can plug into. [21:16:03] Darn, ^ codezee would have been a great person to rope into that discussion. [21:16:39] +1 awight [21:19:20] It has a few external dependencies still, which I’m mulling over today. ConfigStore is the most annoying, because there are so many variations of that wheel. [21:20:12] https://phabricator.wikimedia.org/T201631 [21:20:27] halfak: ^ something we might want to do on a rainy day. [21:21:39] 10Scoring-platform-team, 10Core-Platform-Team, 10MediaWiki-Special-pages, 10Wikimedia-log-errors: SpecialRecentChangesLinked::doMainQuery blocking database infrastructure - https://phabricator.wikimedia.org/T134976 (10Krinkle) [21:22:18] By "rainy day", you mean when people realize how amazing ORES is and it begins to rain money and other resources on us? [21:22:21] ;) [21:22:39] awight, ConfigStore? [21:23:23] just that it’s got this interesting built-in codec for making each object, which is specified in config files. [21:23:40] halfak: it's a good idea to use the structure you recommended, and not difficult at all. but i did have a question about what you meant by "strata" which i added to the PR discussion [21:23:47] Everything has a config store, so I’m wondering if we leave it as-is or reduce the friction with sklearn somehow [21:23:57] s/friction/imepedance mismatch/ [21:24:24] awight, oh ConfigStore in sklearn? [21:24:37] * halfak is not familiar with a ConfigStore in revscoring [21:24:45] I was making up an abstract class name, but I meant the concept of config in revscoring [21:24:56] where the class instances are hydrated at run-time by reading from a config file. [21:25:14] Oh! I see. Yeah. That's a complication. But I do really like it. [21:25:34] It's a very useful pattern for making sure things can be configured nicely. [21:25:52] That’s a fantastic system, but a bit unusual and introduces some indirection when people want to understand how it works. [21:25:56] Extremely useful :-) [21:26:04] I just want to make it compatible with sklearn usage [21:26:30] so we can provide depdency solving as a plugin and people don’t have to go make one of a special config file in its own DSL [21:26:49] PMML seems to come with lots of field mapping descriptions, fwiw. [21:27:15] http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/IrisRandomForest.xml [21:27:40] Different type of mapping, http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/ensemble_audit_dectree.xml [21:27:40] awight, fair point re. sklearn integration. [21:27:51] Maybe we don't need to look like sklearn in all regards. [21:28:36] I haven’t read about that PMML format in detail yet [21:29:53] notconfusing, clarified in the PR that there's no special meaning there. [21:30:06] * awight backs off of the connection between config and PMML, on second thought that’s not a good place, e.g. when applying the enwiki model to simplewiki. [21:30:22] However, I think you should not limit your work to editors who get filtered out by hostbot. It would be great if we could filter some of those editors back in. [21:30:23] 10Scoring-platform-team, 10Core-Platform-Team, 10MediaWiki-Special-pages, 10Wikimedia-log-errors: SpecialRecentChangesLinked::doMainQuery blocking database infrastructure - https://phabricator.wikimedia.org/T134976 (10Krinkle) @tstarling Thanks for checking. In processing the backlog I mixed up this issue... [21:30:40] In Snuggle, we trained based on editors who made at least one article edit in their first session. [21:31:09] halfak: not planning to use hostbot permanently, just to bootstrap some training samples. [21:31:52] i think it would be cool to use this classifier to feed hostbot. right now it just randomly samples new users to invite, in the future we could invite "the most promising" ones [21:32:30] Right on, then :) [21:32:39] Does my explanation of the data-passthrough make sense? [21:36:51] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) I talked to @mark today. Here's what I understood from the conversation: 1. All of the followi... [21:37:37] ^ halfak small typo, “*where the concerns exist” [21:38:14] phab won't let me edit :( [21:39:25] Weird. Seems that phab is brokenish [21:43:49] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) This tentative list is great news from my perspective, and I would have an easy time following it... [21:49:39] create… a task :-[ [21:52:30] lol [21:52:37] Was me. JS was disabled in my browser. [22:01:02] OK I'm out of here. Have a good night folks. [22:01:10] Or whatever happens next in your timezone. :) [22:03:34] o/ [22:07:14] My ORES dependencies diagram is very wrong, it turns out I didn’t understand how datastore uses its extractor. [22:15:21] 10Scoring-platform-team, 10Analytics, 10ORES, 10revscoring, 10artificial-intelligence: [Investigate] Use PMML for prediction model serialization - https://phabricator.wikimedia.org/T173244 (10awight) p:05Normal>03Low Defining fields and mappings should be an interesting exercise. I'm going to deprio... [22:49:10] 10Scoring-platform-team, 10JADE: Provide a fluent API for JADE - https://phabricator.wikimedia.org/T201651 (10awight) [23:45:29] 10Scoring-platform-team (Current), 10ORES, 10Upstream: Make ORES dependency solving upstreamable - https://phabricator.wikimedia.org/T201657 (10awight) [23:58:58] 10Scoring-platform-team (Current), 10ORES, 10Upstream: Make ORES dependency solving upstreamable - https://phabricator.wikimedia.org/T201657 (10awight)