[01:12:31] (03PS3) 10Awight: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 [01:12:33] (03PS4) 10Awight: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 [01:12:35] (03PS22) 10Awight: JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) [01:14:00] (03CR) 10jerkins-bot: [V: 04-1] JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) (owner: 10Awight) [01:14:02] (03CR) 10jerkins-bot: [V: 04-1] AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [01:16:34] (03CR) 10jerkins-bot: [V: 04-1] AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [01:18:09] (03CR) 10jerkins-bot: [V: 04-1] JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) (owner: 10Awight) [01:19:45] (03PS11) 10Awight: Translatable entity type [extensions/JADE] - 10https://gerrit.wikimedia.org/r/443378 [01:19:47] (03PS4) 10Awight: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 [01:19:49] (03PS5) 10Awight: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 [01:19:51] (03PS23) 10Awight: JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) [01:21:23] (03CR) 10jerkins-bot: [V: 04-1] Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [01:21:33] (03CR) 10jerkins-bot: [V: 04-1] AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [01:21:37] (03CR) 10jerkins-bot: [V: 04-1] JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) (owner: 10Awight) [01:23:30] (03CR) 10jerkins-bot: [V: 04-1] Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [01:25:35] (03PS5) 10Awight: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 [01:26:59] (03CR) 10jerkins-bot: [V: 04-1] JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) (owner: 10Awight) [01:27:59] (03CR) 10jerkins-bot: [V: 04-1] AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [01:30:12] (03PS6) 10Awight: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 [01:31:20] (03CR) 10jerkins-bot: [V: 04-1] Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [01:32:17] (03PS7) 10Awight: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 [01:39:30] (03PS6) 10Awight: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 [01:50:12] (03PS24) 10Awight: JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) [01:50:59] (03PS25) 10Awight: JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) [10:00:54] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, 10ORES: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) [10:00:58] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, 10ORES: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) [10:01:04] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, 10ORES: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) p:05Triage>03Unbreak! [10:02:51] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, 10ORES: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) [10:03:04] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, 10ORES: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) [10:19:41] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) a:03Ladsgroup ``` [10:15:57] zeljkof: I know what's going on and I can fi... [10:35:04] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10Ladsgroup) @zeljkofilipin Now, you should not see such error when you deploy to group1. Do you want... [10:38:46] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) @Ladsgroup sure, let's try it. [10:52:13] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10zeljkofilipin) 05Open>03Resolved group1 wikis are at 1.32.0-wmf.14, I don't see the error in lo... [11:25:26] 10Scoring-platform-team, 10ORES, 10Wikibase-Quality-Constraints, 10Wikidata: quick overview of the quality of an item - https://phabricator.wikimedia.org/T195703 (10Lydia_Pintscher) I'll have our UX team look at it :) [11:25:53] (03CR) 10Ladsgroup: [C: 032] JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) (owner: 10Awight) [11:29:56] (03CR) 10Ladsgroup: [C: 032] Translatable entity type (031 comment) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/443378 (owner: 10Awight) [11:31:03] (03CR) 10Ladsgroup: [C: 032] Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [11:36:21] (03Merged) 10jenkins-bot: Translatable entity type [extensions/JADE] - 10https://gerrit.wikimedia.org/r/443378 (owner: 10Awight) [11:42:26] (03CR) 10jenkins-bot: Translatable entity type [extensions/JADE] - 10https://gerrit.wikimedia.org/r/443378 (owner: 10Awight) [11:42:33] (03CR) 10Ladsgroup: [C: 04-1] AppendCreator for adding judgments (035 comments) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [11:45:12] (03Merged) 10jenkins-bot: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [11:55:18] 10Scoring-platform-team, 10Release-Engineering-Team (Watching / External): Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212 (10Ladsgroup) If there is no objection by @Halfak @awight or @Harej by end of next week. I will move our coding repos (rev... [11:58:51] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for Urdu - https://phabricator.wikimedia.org/T173190 (10Ladsgroup) 05Open>03stalled Informal words are still missing [12:00:10] (03CR) 10jenkins-bot: Test helper for judgment storage [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447929 (owner: 10Awight) [12:01:01] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: [Spec] Use `reverted` models in ORES review tool - https://phabricator.wikimedia.org/T146378 (10Ladsgroup) Given that we are not responsible for the UI, I leave this decision to @jma... [12:07:45] 10Scoring-platform-team, 10Patch-For-Review: Use "cache_revs" scap config to limit us to 1 or 2 revisions for rollback - https://phabricator.wikimedia.org/T182013 (10Ladsgroup) Is this done? [12:30:27] 10Scoring-platform-team, 10ORES: Tuning broken in some repos, needs revscoring 2 update - https://phabricator.wikimedia.org/T184727 (10Ladsgroup) It seems done to me. Anything needed here @awight? [12:32:08] 10Scoring-platform-team: Investigate using archiva.wmo to host our wheels - https://phabricator.wikimedia.org/T180606 (10Ladsgroup) Given that we are moving to git LFS now, do we really need it? [14:11:28] o/ [14:11:51] I'm still feeling sick today so I'm going to stay AFK and get some rest. [14:12:01] There's a couple of meetings I'm going to make it to. [14:12:27] The pre-tech conf platform evolution meeting and a sync with Raz [14:12:39] Otherwise, I'll keep halfak|Mobile connected [14:13:16] Amir1, could you relay ^ to folks. I'll send an email to the list too. [14:13:39] halAFK: sure [14:13:41] Take care [14:14:20] Thanks :) [14:36:26] 10Scoring-platform-team, 10ORES, 10User-Ladsgroup: Configure deploy to include CODFW and use the new oresrdb - https://phabricator.wikimedia.org/T159397 (10Ladsgroup) 05Open>03Resolved a:03Ladsgroup [14:46:50] 10Scoring-platform-team (Current), 10User-Ladsgroup: badid_rvstartid error during autolabel - https://phabricator.wikimedia.org/T168592 (10Ladsgroup) a:03Ladsgroup https://github.com/wiki-ai/editquality/pull/167 [14:50:06] wiki-ai/editquality#356 (silence_errors - f640642 : Amir Sarabadani): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/408544858 [16:01:32] Amir1: o/ CR is ready for your rereview, any time [16:01:43] awight: I did it [16:01:46] take a look [16:17:26] Amir1: u there [16:17:36] awight: yup [16:20:17] I was just going to say, I squashed all the ext-JADE patches, then re-split so it should be easier to review. [16:20:39] Tests are passing, and I made the changes you suggested in the AppendCreator patch [16:20:40] awight: I basically +2'd everything except one [16:21:01] hehe sorry I seem to not get gerrit emails any more, I need to check my mail filters. [16:21:44] just typos, then [16:21:48] kk [16:25:03] Thank you, I’ll knock out those last fixes in an hour or so! [16:26:40] yeah, I had a filter to bury gerrit email :-/ [16:28:46] it's so hot today that my brain is melting [16:29:07] I'm going to sit somewhere and read work books instead [16:30:51] :) unrelated to climate change, according to the U.S. news [16:31:20] we call it… “context-free, independent temperature change events” [16:34:22] sometimes the temperature just wants to be high [16:34:40] the temperature being lower than normal is also related to climate change! because climate is complicated [16:35:34] cold one place may be hot in another place. weather is the earth trying to deal with the fact that it is heated unevenly [17:41:09] (03PS7) 10Awight: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 [17:41:11] (03PS26) 10Awight: JADE API to store judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/442885 (https://phabricator.wikimedia.org/T198207) [18:04:09] (03PS1) 10Awight: Service wrapper to prevent misspellings [extensions/JADE] - 10https://gerrit.wikimedia.org/r/448085 [18:12:00] :) [19:51:37] awight: re payments_fraud filter values, how do you think we should deal with NULL values? [20:17:46] saurabhbatra: good question, I realized that I don’t know how to deal with those yet. [20:19:30] Are there reasonable default values we could use? [20:19:51] I don't think so... [20:20:01] Just reading http://www.stat.columbia.edu/~gelman/arm/missing.pdf [20:20:44] It does kinda look like we can give default values [20:20:59] all the getScore functions probably -> 0 is fine [20:21:23] getAVSResult and getCVVResult might be twitchier [20:22:33] reading up on that [20:23:23] most of the NULLs are for the name filter [20:24:12] i think instead of 0 we should keep the mean as the default value [20:25:08] in order to make sure the mean of the data doesn't change [20:26:35] but that would mess up the variance [20:30:56] so out of 7918 rows - ~7k have avs, cvv, countrymap, emaildomain, minfraud filter values. [20:31:17] 7.6k have ipvelocity filter values [20:31:41] awight: but only 2.7k have namescore filter values [20:32:22] so probably makes sense to drop those from our data and implement a custom name score function [20:34:30] Very unfortunately, the missingness is probably directly related to the payment method... [20:34:40] https://github.com/rrenaud/Gibberish-Detector this seems pretty promising [20:35:10] +1 I think you’re right that we’re best off dropping rare features [20:35:23] And the PHP name scorer was nothing to write home about, anyway. [20:35:40] It’s just heuristics to check for mashing one side of the keyboard. [20:35:42] :-) [20:35:46] yup i saw [20:35:54] this uses a markov chain [20:35:58] Gibberish-Detector looks good, but English-only is a tough sell [20:36:07] Also, it’s meant for text which is very different than names [20:36:20] so it can be trained on any language [20:36:27] IMO we should try simple metrics like vowel ratio [20:36:32] just need to change the training file [20:36:45] i think it'll work fine with names as well [20:36:58] I'll give it a try on my system [20:37:28] I think we could train on real names (international), but then we need to get into feature engineering for that sub-project... [20:37:39] actually—what are the features for Gibberish-Detector? [20:37:52] Maybe we just reimplement those features in our main model? [20:37:54] oh so it doesn't use features as such [20:38:08] ah right, it’s doing something contextual [20:38:12] just trained on random english words [20:38:19] the context window is pretty small [20:38:23] 2 characters i think [20:38:25] Or I thought, real English text [20:38:36] https://github.com/rrenaud/Gibberish-Detector#how-it-works [20:39:11] kk, yeah Markov is just probabilities of sequences [20:39:20] something hhshadh gets flagged I think [20:39:34] something like* [20:39:40] I'll give it a shot [20:39:46] see if it works for commong names [20:39:49] *common [20:40:08] Here’s the "good" training corpus, https://github.com/rrenaud/Gibberish-Detector/blob/master/big.txt [20:40:59] ah, the adventures of sherlock holmes :-) [20:41:17] i saw a java fork of this in paypal's github [20:41:29] hehe now we know all their secrets [20:41:51] It seems like a good approach, AFAICT [20:42:05] We could dump donor names as its training data... [20:42:36] We could also leave name out of it, or use simple features and see whether it’s effective or not [20:43:13] so [20:43:16] i tried it out [20:43:22] Saurabh returns false :-( [20:43:29] Adam works though [20:43:37] urgh [20:43:41] yeah that won’t do [20:44:03] maybe tuning the threshold will help [20:44:09] it's working for some indian names [20:44:13] not for others [20:44:30] funnily [20:45:05] u could run against a sample of WMF donor names [20:45:08] it works for 2pac [20:45:12] LOL [20:45:24] but i guess it's not checking for numerals [20:46:14] yeah giving that a shot [20:49:56] inconclusive tbh [20:50:02] but maybe if we feed in a database of names [20:52:17] We have 3M or so names from around the world, some even in their native character set! [20:52:32] but which of them are accurate? [20:52:50] Maybe just select for completed statuses? [20:53:27] They’re not necessarily real names, but are the names that people put into name fields in this context, which is perfect for what we’re doing. [20:53:51] On a side note, I wish we could make the name model public when done, but that would require making the training data public, too :-/ [20:53:57] let me see what that yields [20:54:05] cool! [20:54:31] i think for other filters, just replacing null with 0 will work [20:54:42] will kinda work* [21:01:36] awight: btw re normal data extraction [21:01:47] hehe +1 “kinda" [21:02:19] random sampling is fine, do you think? [21:03:31] extracted 30M names whew [21:03:46] holy moley [21:03:56] Yeah random sampling FTW [21:04:09] It’ll be skewed towards English but that’s how it is [21:05:53] alright cool, i think we'll have model ready features by Monday [21:06:03] i'll drop you a mail in case i have any problems [21:06:30] wow! Take yr time… [21:07:15] Let me know if I can help with anything, of course. I was imagining that fr-tech and I might be able to write some of the feature extractions to save you the footwork. [21:07:42] i already have the data though [21:07:53] getting that was the tough part [21:08:38] transformation of data should not take long [21:10:37] Ah, nice to hear! Yeah, I can imagine the query is fun. [21:12:16] spanning 21 lines now :-) [21:15:30] Sounds like it needs more vertical whitespace, still... [21:16:25] well, the name detector agrees with my name now [21:17:13] and returns false for a lot of gibberish names [21:17:52] i'm impressed, such a simple idea yet so effective [21:19:01] still accepts 2pac but i'm happy :-) [21:33:40] That was crazy fast [21:36:27] the context window's pretty small [21:36:55] so that's why [21:37:03] (03CR) 10Awight: AppendCreator for adding judgments (035 comments) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [21:37:48] (03CR) 10Awight: AppendCreator for adding judgments (031 comment) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [21:44:54] awight: do you think it might be possible to ask cwd or jeff to give you access to my home folder on frdev1001? [21:45:02] so that you can go through the data once [21:45:24] Good idea, or they can make something we share under /srv [21:45:38] that could work too [21:46:10] i think a chgroup change would work [21:46:52] ah, we’re in the wrong channel :) [21:46:59] i pinged him there [21:47:01] Thanks for pinging [22:02:25] halAFK: I’ve got it. We accept one of the alternatives on the table. [22:02:43] In particular, the single-wiki solution is pretty darn close to what we want. [22:04:26] It gives DBAs the physical isolation they need for sanity, and it gives us wiki-ness. [22:05:12] The main drawback as I think of it will be that the site will have an immature community for years. [22:05:37] But otherwise, I see it operating much like wikidata. [22:32:03] Amir1: ^ I’d love your thoughts [22:37:04] Also experimenting with a better interface, what about: [22:37:07] JudgmentPage::forEntity('Diff', $revId)->addJudgment( [22:37:08] Judgment::forSchema('damaging')->setData(true)->setNotes($notes)); [22:57:41] awight: how much in general do you expect users to interact with the raw judgment content? [22:58:20] By raw, you mean the wiki page, or the json content? [22:58:34] I expect collaboration to happen mostly on wiki pages [22:58:56] But want to protect users from the json content format whenever possible…. sort of a paradox [23:00:09] The principal way you would do that is by building an interface on top of the JSON content and not exposing it. [23:00:25] sure... [23:02:40] Thinking aloud, in the transparent migration scenario users will invisibly be creating content on another wiki. Can they get notifications from pings on wikijudgment even though they may not have created an account there? [23:04:04] It should be possible, yes. [23:26:48] Great! I wonder what other lessons can be learned from how Wikidata’s community grew. [23:51:01] awight: basically, there is a lot of mistrust between Wikipedia and Wikidata; a lot of fear from the Wikipedia site that Wikidata is this wild west where people get away with vandalism and disobeying Wikipedia editorial rules. [23:51:27] in creating a separate wiki you risk creating an us vs. them dynamic that we don't really want [23:51:34] ooh [23:51:35] From a product perspective I wouldn't rush into creating a dedicated wiki [23:51:37] that’s serious, ty [23:51:47] Not at all, I see it as a big compromise [23:52:04] But one that satisfies all parties, if necessary [23:52:17] Yeah the immature community thing is scary.