[08:12:28] o/ [08:35:44] https://github.com/mediawiki-utilities/python-mwbase/pull/2 [09:23:33] 10Scoring-platform-team, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-MultimediaViewer, 10MobileFrontend, and 9 others: Audit tests/selenium/LocalSettings.php file aiming at possibly deprecating the feature - https://phabricator.wikimedia.org/T199939 (10hashar) [09:36:41] 10Scoring-platform-team, 10ORES, 10User-Ladsgroup: Mark edits made by anon and beginner users with mw-replace and mw-blank tags as damaging - https://phabricator.wikimedia.org/T192630 (10Ladsgroup) 05Open>03Resolved a:03Ladsgroup These are already captured in ORES indirectly using several features. [09:53:29] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10User-Ladsgroup: Deploy ORES Review Tool for Latvian Wikipedia - https://phabricator.wikimedia.org/T163007 (10Ladsgroup) 05Open>03Resolved a:03Ladsgroup This is done [13:23:29] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review: Update log config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497 (10Ladsgroup) 05Open>03Resolved a:03awight ORES has been moved out of scb nodes and ores nodes seems pretty clean to me: `... [13:26:46] 10Scoring-platform-team, 10ORES, 10Graphite: ORES web worker memory usage graph is meaningless - https://phabricator.wikimedia.org/T182871 (10Ladsgroup) This seems resolved: https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&refresh=1m [14:10:38] 10Scoring-platform-team: Send error logs to logstash - https://phabricator.wikimedia.org/T168921 (10Ladsgroup) Today I realized this is very important, we don't report anything outside of uwsgi logs to the logstash and I don't have access to syslog or deamon.log, Basically I can't see any errors of ORES [14:17:45] 10Scoring-platform-team, 10ORES: Implement prioritization of request processing - https://phabricator.wikimedia.org/T148594 (10Ladsgroup) I hereby suggest declining this. We don't have capacity problems anymore and if needed we should dedicate fast lanes (queues) for important jobs and not prioritize requests.... [14:22:15] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Remove all usages of alert and confirm in wikilabels - https://phabricator.wikimedia.org/T200199 (10Ladsgroup) [14:24:30] o/ [14:29:37] harej: o/ [16:00:06] Are we having a meeting this week? [16:01:42] o/ Amir1 and harej [16:01:50] I won't be able to connect to the hangout. Due to travel. [16:02:05] harej: I will join you in a sec [16:02:34] I land at 2200 UTC tomorrow. I'll check in then. [16:13:49] 10Scoring-platform-team, 10ORES, 10Graphite: ORES web worker memory usage graph is meaningless - https://phabricator.wikimedia.org/T182871 (10awight) 05Open>03Resolved a:03awight [16:18:42] halfak: what are your thoughts on adding a "Next" column to the Scoring Platform Team board as a bridge between that board (and its topic-area columns) and the Current board? [16:20:43] Amir1 asked about it at today's meeting and I'm generally supportive, but wanted to hear what you think [16:35:02] I'm not sure I understand the reasoning harej [16:35:13] Maybe we should just move priority items to the top of the lists. [16:36:01] Amir1 can explain better but I think it's because the workboard is so huge and it's hard to distinguish between what is viable as a task for the current board and what is blocked on the community or something like that [16:36:16] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Remove all usages of alert and confirm in wikilabels - https://phabricator.wikimedia.org/T200199 (10Ladsgroup) https://github.com/wiki-ai/wikilabels/pull/239 [16:36:28] the Current board, I mean. [16:36:54] harej: exactly, I tried to pick up several high-priority tasks but most of them were blocked on something [16:37:06] Maybe we should decline things that are blocked on input. BUt even so, one can request that input. [16:37:17] Amir1, what is an example? [16:37:30] Seems to me a "next" column would ruin our organization. [16:37:33] Wikimedia Phabricator generally doesn't close tasks unless they're finished or totally invalid [16:37:37] I'd rather have a "blocked" column. [16:37:52] Who is Wikimedia Phabricator? [16:37:53] :P [16:38:08] The community that uses the board. It's a whole political thing. [16:38:46] halfak: https://phabricator.wikimedia.org/project/board/1901/query/2DUiS_Qke7VW/ [16:38:56] https://phabricator.wikimedia.org/T156820 [16:39:01] https://phabricator.wikimedia.org/T171619 [16:40:33] What I really want is a bridge between "scoring-platform-team" board and "scoring-paltform-team-current". A place that I can pick up things easily [16:41:02] I feel like the board should be its own bridge, and that it's failing to do so suggests something is going wrong [16:47:50] argh what day is it [16:47:54] It's Monday [16:48:21] harej, agreed. It seems to me that the backlog board is more of mess than we'd like, but I don't think a "next" column is a solution. [16:48:50] We probably need to fix the fundamental mess with it, which is that it's work candidates interspersed with not-ready-yet tasks. [16:49:02] But it's hard to do that when you have a column-based system [16:49:15] Errr [16:49:21] When your column system is based on *topics* [16:49:30] Amir1, re T156820, I think the next step there is to start exploring what the implementation might look like. [16:49:31] T156820: Implement ORES wp10 predictions in PageAssessments tool - https://phabricator.wikimedia.org/T156820 [16:49:34] I don't see what is blocked. [16:50:04] harej, prioritized topics. [16:50:23] And yes, I agree that some are "blocked". We can set their status as such. [16:50:33] And also, there's work we can do to unblock things. [16:51:05] harej, ^ [16:51:05] So I guess it's a matter of more judiciously tagging things as stalled? [16:51:21] Could be, yes. Or taking initiative on them. [16:51:23] harej: Would it make the boards more functional if we floated the stuff that’s work-ready to the top of its column? [16:51:46] I'd be fine with that; what do you think, Amir1? [16:52:50] We should also reconsider high priority items that are not work-ready and have been that way for a while. Maybe they aren't really "high priority" [16:53:40] Also should I be moving things from Backlog into their appropriate work columns? [16:53:48] Yes. [16:53:56] That's an often-neglected grooming task :( [16:54:00] Also how did we decide on these columns to begin with? [16:54:26] Organically as we attempted sorting the backlog and assigning priority. [16:54:53] E.g. the "research" column was added relatively late, but the "community" column was added in the first iteration. [16:56:14] Amir1, what's blocked about https://phabricator.wikimedia.org/T171619? [16:56:20] Would you say they are functional things that we as team members do? Like, we have a community outreach function, a writing code function, a research/academic function, etc.? [16:56:20] I'm honestly not sure. [16:56:46] https://phabricator.wikimedia.org/T182085 [16:57:32] Ahh yeah. Looks like the last update was a month ago. Seems like it needs a ping. [16:58:06] Also, is this really a blocker? [16:58:16] Yeah wait, why is that blocking? [16:58:16] Can't we store blobs in MySQL in the meantime? [16:58:25] Blobs are already stored in gerrit [16:58:27] LFS is happening [16:58:30] right [16:58:51] It's the only open subtask of the main one [16:59:00] I think this is an Ops-only problem [16:59:24] Sure, but just because it's a subtask doesn't mean that it's a blocker or there's nothing we can do to make progress. [16:59:39] Here’s the remaining LFS work, https://phabricator.wikimedia.org/T197096 [17:02:43] 10Scoring-platform-team, 10JADE: Determine which wikis will get JADE and when - https://phabricator.wikimedia.org/T199520 (10Harej) 05Open>03stalled [17:02:56] 10Scoring-platform-team (Current), 10JADE, 10Operations, 10TechCom, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10Harej) [17:02:59] 10Scoring-platform-team (Current): Scoring platform team FY18 Q2 - https://phabricator.wikimedia.org/T176324 (10Ladsgroup) 05Open>03Resolved It's almost a year since this Q has passed. Closing this [17:03:12] 10Scoring-platform-team (Current), 10JADE, 10Operations, 10TechCom, and 3 others: Deploy JADE extension to production - https://phabricator.wikimedia.org/T183381 (10Harej) 05Open>03stalled [17:03:46] A note on stalled tasks: on workboards, there is no visual way to distinguish between a stalled task and an ordinary opened task [17:04:35] I like using a stalled column... [17:05:00] guess we need to revisit it regularly though [17:05:05] awight, why not "stalled" status? [17:05:37] Should we have a separate Documentation column? [17:05:50] It seems more important that they matter and are stalled, rather than what functional group they’re part of [17:05:51] It may help our documentation efforts to have a separate column... [17:09:11] awight: do you know why this is listed as low priority? https://phabricator.wikimedia.org/T198691 [17:10:56] 10Scoring-platform-team, 10JADE: Handle Jade edit conflicts - https://phabricator.wikimedia.org/T198691 (10awight) This is low priority due to its non-urgency at the moment... I'd say we should gather a few data points about how likely edit conflicts turn out to be in our new namespace before prioritizing any... [17:11:44] (thank you) [17:13:18] 10Scoring-platform-team, 10JADE, 10Design: Custom diff for JADE content - https://phabricator.wikimedia.org/T198226 (10Harej) p:05Triage>03Low [17:13:21] halfak: https://github.com/mediawiki-utilities/python-mwbase/pull/2 [17:16:04] I'm currently moving things into/between columns on the Scoring Platform Team board. If I screw it up feel free to fix my mistakes and/or yell at me :) [17:18:42] Amir1, doesn't seem like the right change. It's nice to have order in mwbase. [17:19:16] 10Scoring-platform-team, 10JADE, 10Documentation: Develop "local" documentation for wikis where JADE will be deployed - https://phabricator.wikimedia.org/T199519 (10Harej) [17:19:35] Note ordering here: https://github.com/mediawiki-utilities/python-mwbase/blob/master/mwbase/entity.py#L14 [17:22:06] 10Scoring-platform-team, 10JADE: Write a letter to JADE stakeholders - https://phabricator.wikimedia.org/T197668 (10Harej) [17:25:13] halfak: would it be okay if I created a separate Documentation column? I consider it a distinct kind of work from community engagement [17:25:14] halfak: I don't understand, it turns them to attributes of the object. I can't think of any place that order (e.g. sitelink before the statements) would be useful [17:25:52] harej: +1 from me [17:26:00] Yeah. I think that makes sense. Too many columns are hard to navigate, but I think this distinction will be important in the next few months. [17:26:18] Amir1, in printing the document, for one. [17:26:29] When it uses an ordered dict, it preserves order. [17:26:40] It seems like the underlying problem isn't really being solved. [17:27:11] Is it just that you have no idea why OrderedDict isn't working so you want to just drop ordering? Or do you know why OrderedDict doesn't work? [17:27:22] harej: Any thoughts about T198224? The best I’ve come up with is that we’ll need to announce a hard cutover and take a few hours of read-only downtime. [17:27:22] T198224: Spec version indicator for JADE content - https://phabricator.wikimedia.org/T198224 [17:27:33] I do know why it's failing [17:27:53] because it's checking internal attributes in the dictionary we have given to it [17:28:15] I tried to bypass checking that but it's not possible [17:28:39] 10Scoring-platform-team: Switch to a non-WMF hangout for the staff meetings so we don't have access granting foolishness - https://phabricator.wikimedia.org/T180613 (10Harej) p:05Triage>03Lowest [17:28:40] afk for dinner [17:29:32] 10Scoring-platform-team, 10JADE: Create a JADE poster - https://phabricator.wikimedia.org/T195759 (10Harej) [17:33:14] awight: would it be possible to declare the current version as version 1 and annotate the content model accordingly? Would that still cause downtime? [17:35:05] Hmm. Gotcha. [17:35:07] But why would this happen with OrderedDict and not dict? [17:35:09] Can you build a simple demo of the issue when you get back Amir1? [17:35:11] harej: I’m imagining that we can do non-breaking changes and deprecations, e.g. adding a new version field in the content (whether or not that’s a good idea) would be non-breaking as long as the clients don’t enforce the strict jsonschema. So we would need to roll out a new schema before the new content. [17:37:51] I feel squeamish about adding the version number to content for some reason. Imagine if we had to add {{#wikitext:1}} to Wikipedia articles to denote version 1 of wikitext, or what have you. [17:38:01] Having it as *metadata* would be great though. [17:38:26] +1 I don’t want to that, just trying to follow as an example of how I imagine the “easy” migrations [17:38:34] Not sure what metadata will accomplish though [17:39:39] Did you ever find out if content model supports schema versioning? [17:39:57] I shouted into the wind, but no answers [17:40:34] halfak: because OrderedDict has internal attributes in python3.4 implementation, in python 3.5 it migrates to full c implementation. [17:40:35] It's common for documents to reference their schema and for schemas to be versioned [17:40:40] awight, harej ^ [17:40:44] halfak: You can see my attempts in https://travis-ci.org/mediawiki-utilities/python-mwbase/builds [17:41:14] Amir1, I see. Hmm.. we should be able to work around this. [17:41:20] halfak: In wiki content? [17:41:38] https://stackoverflow.com/questions/3278077/difference-between-getattr-vs-getattribute [17:41:40] Amir1, ^ [17:41:52] awight, not in wiki, but in JSON documents generally. [17:42:01] I tried super https://travis-ci.org/mediawiki-utilities/python-mwbase/jobs/405858891 and all sorts of things [17:42:08] * halfak checks wiki stuff. [17:42:53] Amir1, if we call __getattribute__ that should entirely circumvent any local attribute access. [17:42:55] I doubt that would solve it, do you want to give it a try? [17:43:29] Amir1: Do you have insight into how Wikidata performs content schema upgrades? Maybe they haven’t need to yet? [17:43:45] which would defy the whole point of having ordered dict, because it uses these attributes to keep the order [17:44:51] Amir1, I don't see why it would defy any point? [17:45:52] We're just working with access patterns now. [17:46:20] halfak: Certainly, but it doesn't seem very common in MediaWiki contexts [17:46:46] Another concern I have is that putting it in the document is that we'd effectively have to support all versions of the schema for all time [17:46:49] awight: I have no clue [17:46:55] Since removing support would break old revisions [17:47:19] If versioning were made to be a metadata attribute of a given revision, we could have special compat handling logic separate from the "main" schema code representing the current version [17:47:46] Amir1, https://github.com/mediawiki-utilities/python-mwbase/pull/3 [17:48:17] halfak: https://travis-ci.org/mediawiki-utilities/python-mwbase/jobs/407265658 [17:48:22] harej, I don't see how that is true. Either that or we're going to need to re-write all of the old docs whenever we make a schema change. [17:48:40] Also, metadata of a revision is not currently a thing. [17:48:48] halfak: you are going the path I went before and came with this solution [17:49:07] Oh? What's up with this path, Amir1? [17:49:46] halfak: did you see the build failure? [17:50:01] In any case, I don't have any particularly strong insight on how to handle future breaking changes. Just that I consider it necessary that if we change the schema in the future, we have a path forward. [17:50:05] Amir1: how does Wikibase handle schema changes? [17:50:17] The alternative is that we do not have any breaking changes ever. [17:50:26] Amir1, Aha. It looks like methods are attributes. [17:50:45] harej: they keep b/c for a while [17:50:58] I think practically forever [17:51:24] but it has been a while since they have done anything [17:51:37] before I officially join [17:51:44] What happens if I try to access an old version of a page that relies on an old version of the schema that is no longer implemented? [17:52:11] Here’s a breaking content change in wikibase, fwiw: https://lists.wikimedia.org/pipermail/wikidata-tech/2016-November/001050.html [17:53:06] harej: +1 that a migration plan is a must-have [17:53:23] I have something outlined already, but it’s not the betx [17:53:25] *best [17:53:25] Breaking from the client perspective, not sure about it being a breaking change from the server perspective. All they're doing there is relaxing a requirement. [17:53:41] awight: do you have a minute to check this? https://github.com/wiki-ai/wikilabels/pull/239 [17:53:56] Aha I see the complication [17:54:07] Amir1: will do [17:54:15] I have another idea that I will mock up in Balsamiq. Give me a moment [17:54:20] awight: Thanks! [17:54:38] harej, awight what about contenttype/model [17:54:44] Is that at revision-level? [17:54:53] Or is it contentformat [17:55:07] halfak: yes content model is per-revision, but I don’t think it has any attributes for version [17:55:26] harej was already talking about versioning the content model itself, e.g. type=“JudgmentV1” [17:55:53] but I don’t think there’s a precedent, nor does it solve much more than making it easier to process historical revisions [17:56:01] How does wikibase do this? [17:56:08] We’re all asking that :-) [17:56:13] ha [17:56:15] halfak: I sent an example in a link above [17:56:19] https://lists.wikimedia.org/pipermail/wikidata-tech/2016-November/001050.html [17:56:21] I think Wikibase does it by not really having breaking changes [17:56:31] Here’s their policy: https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy [17:56:33] No it does harej [17:56:38] no, they do have breaking changes [17:56:39] That's why mwbase exists :) [17:56:43] and pywikibase [17:57:01] They basically are designed to cram an entity into a consistent format. [17:57:02] I found a few examples in the mailing list archives, they use “BREAKING” in the subject line [17:57:18] AFAICT, historical revisions are left broken. [17:57:25] AFAICT, the deal is just that they announce it, then make the change. [17:57:37] +1 historical revisions are borken it seems [17:57:52] Would be wonderful to have a version in those circumstances [17:58:00] Rather than reverse engineering the schema in the code. [17:58:06] We’ve planned to include migrations in our “all history” dumps [17:59:18] the down side of my current plan is that we would be detecting step changes using… a manual log I guess [18:00:46] Right. I like dumping the schema in the content honestly. [18:01:02] It seems the strategy that aligns most closely with external standards. [18:01:18] https://www.wikidata.org/w/api.php?action=query&prop=revisions&titles=Q18627581&rvprop=content&rvlimit=1 [18:01:43] That is not meant to be human-readable so I don't see why we couldn't have some metadata like a schema in there. [18:02:49] It’s notable that wikibase hasn’t had the need for a schema [18:03:08] awight https://usercontent.irccloud-cdn.com/file/bUwYVfu7/Screen%20Shot%202018-07-23%20at%2011.02.45%20.png [18:03:08] *reference to schema URL [18:03:50] Then clients have to be able to handle any version, which is asking a lot for little gain. [18:05:03] re. including the schema, a problem is that the URL is either stable and points to the latest update on the schema major version, or we have a new URL for every minor update of the schema, which is counter to the philosophy of URLs [18:05:18] and similarly really hard to parse for clients [18:06:06] IMO, a client should only ever have to parse the current format, including any parts which are deprecated but not yet removed. That gives us pretty understandable and solid CYA [18:06:32] (cool mockup btw) [18:08:00] I think the question we need to answer first is: do we want to give ourselves the option to have breaking changes, or are we comfortable doing away with them? [18:08:33] We will need the option [18:09:13] If we accept that we need the option to make a breaking change in the future, we may have to accept some of the implications, such as the need to maintain documentation for versions. [18:09:41] +1 [18:10:05] There are nice things we can do, of course, like write maintenance scripts to automatically migrate the current revisions of judgments to the latest schema. [18:10:26] But let’s discuss other requirements, IMO one is to keep client logic simple [18:11:02] Is that a more important need? [18:11:27] More important than what? [18:11:48] The tradeoffs will be pretty concrete, I think. [18:12:23] Ideally we want both the option for breaking changes and simple client logic, but if we can't have both, what do we choose? [18:12:37] It’s not a tradeoff we have to make IMO [18:13:03] +1 for needing breaking changes [18:13:36] That’s why I’m suggesting we look at concrete examples, e.g. we can have breaking changes and simple client logic by following the Wikidata stable interface policy, but the tradeoff is that we have downtime or accept that some clients will break. [18:13:39] I don't think we can ever get away without them [18:13:52] I've never seen a schema that stayed the same. [18:14:15] Time to go! Godspeed y'all. I'll be in flight for ~24 hours. [18:14:23] o/ [18:14:27] excellent parting words ;-) [18:15:03] halfak: ah pls reschedule our 1:1 when you can [18:16:37] also awight, do you have any thoughts on how to proceed with the objections from SRE? [18:17:12] I think it's a matter of preserving the parts we like about wiki pages while doing some back-end trickery to make the operations people happy [18:17:34] I think if our goal is to make a fully transparent auditing system, the product has to be a system of wiki pages. [18:20:49] it’s ridiculous… [18:21:27] Maybe we need to address the concerns directly rather than trying to get technical [18:21:50] We seem to be getting the advice (and put in a very block-y way) that we can’t use wiki pages because wikis are hard. [18:22:16] We obviously can’t accept that, so it’s gonna be a turbulent ride [18:22:42] So, maybe we should look at Ops’ arguments about why we can’t have wiki pages [18:23:03] It seems to revolve around my incautious first posts on the scalability task [18:23:21] Is there really a potential for unlimited contributions? [18:23:47] They seem convinced of it, and the fear of such misuse seems to be more relevant than the actual risk. [18:23:49] If a wiki community allies against our tech staff, then yes. We’ll have to lock the namespace or something. [18:24:01] +1 [18:24:20] Like they were burned on past experiences, and they don't want yet another bad experience. [18:24:27] Is it possible to dispel the fear? We could give lots of examples of how this is accomplished in wikis. [18:24:28] A pattern of getting burned and setting up precautions [18:24:50] Ah yeah thanks, that reminds me that I made a bit of progress in understanding jynus’s concerns. [18:25:18] At first I was confused, because all the examples he gave were of projects who used *custom* tables [18:26:17] But now I see that what he’s saying is, if we use wiki pages and it turns out we’ve f—d up our projections or schema and need some radical change, it’s much harder to make that change in wiki content space than in a custom table. [18:27:16] I tried to reassure in several ways, but no success. [18:29:14] You may have not seen this yet: https://etherpad.wikimedia.org/p/JADE_scalability_minutes_2018-07-17 [18:29:18] I think this is unfortunately a domain where you can’t convince with reason because the motivation ultimately is fear of repeating past bad experiences. So the best thing we can do is assuage their fear. [18:29:25] There was a lot of back and forth between myself and jynus, after the meeting [18:29:29] +1 [18:29:59] And I appreciate their experience and fear, just no clue yet what will be a satisfying answer. [18:30:46] I feel like I’ve addressed every concern, but apparently not well enough [18:30:49] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, and 2 others: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10awight) Here are the notes from our meeting, plus some more discussion afterwards: https://etherpad.wikimedia.... [18:31:27] harej: I’m sure you saw this, though? https://etherpad.wikimedia.org/p/JADE_scalability_FAQ [18:31:44] I’d like to summarize more of the meeting and post-meeting minutes into the FAQ for our next round. [18:32:25] MBergsma wants us to go back to TechCom, but it didn’t sound like “we absolve concerns and leave it to them”, it was more like “you must not go forward with this plan, maybe TechCom can talk you into something with custom tables” [18:34:24] Side note on migration, this is interesting: https://lists.wikimedia.org/pipermail/wikidata-tech/2016-February/000910.html [18:35:57] tl;dr, “I think clients should be written in a forward-compatible way” which I agree with. Notably, that would mean not enforcing our JSON schema on the client side, unless it was possible to discover a known most-recent schema. [18:38:49] ah yeah the “robustness principle” [18:39:28] I agree with that sentiment. Which should limit the number of breaking changes we have to make. Since they won’t be breaking as such. [18:40:45] +1 [18:44:07] So however we approach this, is the fundamental user experience on the table? [18:44:56] A solution I hope works is that they’re still fundamentally wiki pages but stored in a more “scalable” way [18:45:15] Hmm, IMO that’s not gonna happen [18:45:29] I very much don’t want to do anything special on the backend... [18:45:41] My hope is that we can address the concerns on another level [18:55:49] hey awight! [18:55:54] got time for a chat? [18:56:34] saurabhbatra: hi, sure thing [18:56:55] on the fr channel or dm? [18:57:27] harej: Just to be clear, +1 that wiki pages are a must. [18:57:37] saurabhbatra: fr, if you don’t mind [18:57:48] yeah sure! [21:13:10] So i had a thought.we should hook ORES up to confirmedit so that if someone makes like 3 very likely bad edits in a short time period they start getting captchas for a little while [21:13:30] Is there any reason this would be a bad idea? [21:13:39] I like it [21:13:52] But what if we impede false positives? [21:15:18] at the end of the day its still just a captcha - its not stopping anyone just annoying them slightly [21:16:13] we already do this to anyone (unregistered) who adds a link. Which is probably much more false positive heavy [21:19:42] i guess theres two different things we could do here: all "very bad" edits get captchas (unless you have skipcaptcha right)....or we could use it as an anti bot measure: if you make 5 likely bad edits in a 1 minute period you are prob evil bot so we blacklist your IP for an hour and show anyone editing from that ip a captcha [21:20:12] +1 what about reducing the annoyance to anons adding links, using ORES [21:21:06] on a slightly related note, i think itd be cool to start showing captchas for people who get reverted a lot [21:21:11] At least, that was the general approach of T158909 [21:21:12] T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) - https://phabricator.wikimedia.org/T158909 [21:22:04] since in a mass vandal attack people start reverting before an admin can be located [21:23:49] So is the goal to slow down an attack? Or maybe we just use the recent historical scores for a user as an input when calculating their new edits’ scores? [21:23:54] Unfortunately, im not sure how we could have a machine learning captcha and still allow edits via api [21:24:01] +1 [21:24:19] And if we’re certain enough to stop them, let’s just stop them and notify that their account is frozen until it can be reviewed? [21:24:53] i do want to adjust our captcha to be more human friendly (probably up hill battle). Our current one is terrible [21:25:12] Hard for humans easy for computrrs [21:26:01] I think we should steal the ReCaptcha approach [21:26:11] we already use cross-project cookies for fundraising opt-out [21:26:52] we just do the same thing when deciding whether to captcha at users: harvest their site last_visited and guess if they have a normal level of interest in encyclopedic topics. [21:30:53] That sounds "difficult" to me but i dont have much experience with machine learning [21:33:19] nah it’s not even machine learning necessarily [21:33:58] What I believe Google does is check whether people have cookies on their sites, if the cookies match and so on. [21:35:05] That just by itself is pretty easy to forge. I was under the impression they did some fancy stuff to see if the person is humanish [21:35:18] Not having cookies is suspicious enough to show the fallback captcha, and if you have cookies there’s probably a center point for “not suspicious”, then it gets more suspicious towards the margins. [21:35:35] combined with ip greylisting [21:35:48] Who would visit Wikipedia one in the last 6 months, then edit? Either an advanced user at another wiki (easily checked via cookie) or a scammer [21:36:14] ah yeah ReCaptcha is a whole suite of tests [21:36:43] But IIRC cookies are shown to make up a huge proportion of the result [21:37:41] oops, I’ve been using the wrong brand name. [21:37:48] * awight shuffles through old copypasta [21:37:52] I guess we'd have to hmac the IP in the cookie to prevent forging. Some people might think we are tracking them [21:38:44] Analytics is very proud of the fact we have no unique identifiers [21:39:03] https://en.wikipedia.org/wiki/ReCAPTCHA#No_CAPTCHA_reCAPTCHA [21:39:14] they call it “invisible ReCAPTCHA” [21:40:08] There are already cookies we can use, I believe. [21:40:24] Analytics has a “last day visited” stamp called something else, obviously [21:40:33] Yeah, i get the branding confused too, i knew what you were referring to [21:40:36] & then fundraising cookies, maybe othere stuff [21:40:40] hehe brand-blind. [21:40:43] If only. [21:42:09] It would be interesting how effective itd be...especially in an open source context where they could just look up the code and fake the cookie [21:42:41] I think that takes an extra level of dedication, probably a 99% filter I’d expect [21:44:34] Yeah. Our captcha still appearently stops a lot despite that tesseract with no training or preprocessing can solve 30-50% of the time [21:44:44] so lots of lazy bots [21:45:07] https://phabricator.wikimedia.org/T125132 was where i was playing with alt image captcha schemes btw [21:46:22] Thanks for adding me! [21:46:53] Im not even sure why thats private. Not a whole lot of secret stuff in there [21:47:41] So my thoughts when working on the sign-up captcha project I linked was, that we come up with our own suite of analyses and allow ourselves to use them all in tandem. [21:48:23] For example, let through 95% of the most obvious humans without a captcha, but show a captcha to the remaining 5%. [21:49:01] Then, let people edit even with a moderately suspicious score at signup, but prioritize their edits for review [21:49:16] The greylisting is real lol. When I forget to disconnect my VPN to DigitalOcean I get hard reCaptchas everywhere. [21:49:19] Ill have to watch the youtube presentation from that project [21:49:26] penalize the IP address if lots of reverts hapen, etc. [21:50:22] hehe I love when 50-year-old programming languages in the chat room turn out to be mastermind haxors [21:52:11] It seems like much of our antispam tools date back to mid 2000s (ORES being obvious exception) i really think its something foundation should be paying more attention to [21:54:10] +1, thanks for thinking about how we might leverage existing tools [21:54:26] Feel free to schedule a cross-team chat on the subject? [21:54:56] Or generally, I’m personally interested in how to make all social processes work better and not just antivandalism [21:55:11] hehe “generally specifically”. I should sleep more between days. [21:56:16] Sure (maybe. Im not sure how much time ill have to dedicate to this in a work capacity. Security team has a lot on its plate at the moment) [21:56:58] although i did spend most of last week reading about captcha schemes. Its a fascinating subject in its own right [21:57:39] Yeah, I’m bothered by how close to being within reach a cookie-based captcha relaxation would be. [21:59:03] Anyways, thanks for the conversation. Its given me lots to think about