[01:14:49] Should be monotonic. Thanks for flagging this. [01:15:01] Oh wait. No precision *can* fluctuate [01:15:10] Still can you file a task with your notes? [01:15:19] RoanKattouw, ^ [01:18:23] I wonder if we have a bad actor or maybe some confusion about the meaning of "good-faith" [04:06:56] Yeah I'll file a task and link it on the arwiki task [04:12:33] The other three (lvwiki, cawiki and arwiki) look fine, I have complaints about those models being somewhat unhelpful but I can work with them. But I'm not going to deploy arwiki because the goodfaith model isn't usable [04:13:30] arwiki.damaging is workable but also not a good model [04:24:54] RoanKattouw: Just saw your backscroll. I have this scrap code lying about, https://github.com/adamwight/thresholds_diagrams [04:25:11] might help unless you’re already graphing the P-R curves [04:26:05] sorry the ipynb doesn’t run in GitHub, I didn’t realize that would be the case when playing with bokeh [04:27:36] We could dust this nice-to-have scoring stuff off at the hackathon! [04:28:07] This week, though, I have some urgent MW API client coding to do... [04:29:11] RoanKattouw: do you know of any Python code that acts something like an itertools layer for bulk operations through mwapi? [04:29:30] I am not graphing those curves yet so that would be great [04:29:56] awight: I don't, sorry. pywikibot folks might know [04:30:04] For example, I have three sets of queries to run, each of which does a one-to-many unrolling or iteratively adds information to records [04:30:08] kk thanks [04:30:11] Oh I see [04:30:14] Do you know about generators? [04:30:24] yes, just scratching the surface now. [04:30:36] I see that some generators also do… orgthoganal generation of result sets, eh? [04:30:41] *gonal [04:30:43] They can help you roll a one-to-many unrolling into a single HTTP query [04:30:50] What do you mean by that? [04:31:27] it seems that I can e.g. continue on a query of the templates embeddedin a set of pages [04:31:40] * awight double-checks [04:31:55] Oh yeah you get two-dimensional continue params essentially [04:32:06] Because you are paging both the inner and the outer query [04:32:48] BTW in addition to graphing P-R curves, graphing thresholds on the X axis and P/R on the Y axis would also be helpful [04:33:06] To identify models where the useful band of thresholds is really narrow (this is common in a lot of the more recent models) [04:33:12] can you… note that somewhere? Phab task for the hackthon maybe? [04:33:23] Yeah I'll add a TODO to write all this stuff down tomorrow [04:33:31] I'm supposed to be in a curriculum hackathon for my volunteering thing now :) [04:33:45] oh! cool, enjoy [04:34:34] * awight stretches to recall what mysterious type of teaching RoanKattouw is up to [04:34:41] haha [04:34:49] https://scripted.org/ [04:34:50] n.b. “curriculum” was the Roman horse racetrack [04:35:17] :claps x100: that sounds like a fun gig [04:35:38] They're an org that gets tech people like me to volunteer as teachers in under-resourced schools to teach HTML/CSS/JS to high school kids [04:35:48] The employees are former high school teachers so they can meta-teach us [04:36:17] I need to tag along some time [04:36:41] :D “meta-teach” [04:37:10] * awight would like to attend https://en.wikipedia.org/wiki/Highlander_Research_and_Education_Center [04:39:18] RoanKattouw: oh, we already have the second graph type you were imagining: https://github.com/adamwight/thresholds_diagrams/blob/master/draftquality-OK.svg [04:39:40] Ooh nice [04:39:46] https://github.com/adamwight/thresholds_diagrams/blob/master/damaging.svg [04:39:53] Maybe we can make those graphs for some of the models that I have pet peeves with [04:40:12] Nice! [04:40:16] yeah my priorities at the time were, quick and pretty graph, but now I need to take a pass for usability and reusability [04:40:17] Lemme share a spreadsheet with you [04:40:54] Or, maybe I can add the points that I'm interested to to the graph myself later [04:41:35] [04:42:25] You've already got P=0.15 and R=0.9 which I both care about. I also care about P=0.6 and P=0.9, and for the reverse model (damaging=false and goodfaith=true) I care basically only about P=0.995 [04:42:35] Those are the defaults in the RCFilters config for the user-facing filter options [04:42:53] But maybe at the hackathon we can play with these graphs [04:44:09] I also sent you (by gdocs email) the spreadsheet that I use to get a feel for the models and choose the thresholds I want to use for RCF [04:44:22] No rush at all, I should get back to ScriptEd stuff [04:44:29] Also! Here's a fun hackathon project maybe [04:44:49] I've been wanting to make a special page that exposes the P/R values for the threshold settings for each ORES filter on each wiki, so users can see it [04:45:03] Right now it's really hard to figure out what the filters mean mathematically, even as a power uesr [04:45:47] 10Scoring-platform-team (Current): [Discuss] Random sampling by PAWS vs API requests - https://phabricator.wikimedia.org/T193789#4179717 (10awight) Thanks for making the task! I'm going to start with the exhaustive, MW API-based approach that I saw was mentioned in the IRC backscroll. Even in the crude form it... [04:46:57] awight: Anyway, getting back to volunteer work now but lmk if ---^^ plus graphs sounds like a fun hackathon project to you [04:47:08] RoanKattouw: That’s a fine idea, and fits in well with the theme of algorithmic transparency and pushing the bounds of end-user education. [04:47:10] +1 [04:47:17] Yay [04:59:19] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4181099 (10awight) [09:14:01] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4181403 (10awight) Here's a light-touch fix that runs in reasonable time, https://github.com/wiki-ai/drafttopic/pull/23 Unfortunately, I was unable to train a model, cra... [09:46:05] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4181469 (10awight) Trying another run after minor changes. [10:08:40] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4181506 (10awight) a:05awight>03None Darn, same error. Looks like something simple, but I'll give it a break for now. My work is pushed to the PR. (Building on ore... [10:08:52] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4181508 (10awight) p:05Triage>03High [10:35:17] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: Design a data collection pilot using WikiLabels platform (mining reasons) - https://phabricator.wikimedia.org/T186351#4181562 (10Miriam) [10:44:55] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: Design a data collection pilot using WikiLabels platform (mining reasons) - https://phabricator.wikimedia.org/T186351#4181632 (10Miriam) [15:37:05] awight: have you verified if the latest fetch text works? bec i remember the one pushed by me in the last commit has issues with redirected articles when we process in batch [15:37:36] awight: so I may need to first revert that script to a version which fetches articles one by one and then gets the first revision [16:05:08] codezee: hi, I think I did what you’re describing [16:05:12] check out PR#23 [16:05:36] One thing I did differently, that you should check however, is that I dump all redirects... [16:05:52] I /think/ that’s what we want, because those aren’t actually initial drafts. [16:09:15] ah, you can also check the output in ores-misc-01.eqiad.wmflabs:/srv/home/awight/drafttopic/datasets [16:09:35] I got stuck building the model, some schema-looking error I don’t recognize. [16:10:48] pasted the crash into the bug, if u wanna see, T193834 [16:10:50] T193834: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834 [16:11:44] awight: thanks! so you're saying you were able to fetch the text with this PR? [16:12:19] yes! It just needs a second opinion... [16:12:46] I decided to leave the PAWS change for a rainy day, since it isn’t actually blocking us on regenerating the data [16:13:08] awight: strange, and your input was -enwiki.labeled_wikiprojects.json right? because i'm using the same script and it seems to show error for every article [16:14:29] I think those are just the redirects [16:14:33] try spot-checking [16:14:48] I also raised the thread count by 250x :) [16:14:59] may have been a bit much [16:15:18] So it looks like the redirects come back first, and after a minute or so you’ll see the normal “.” hashes [16:16:03] awight: okay, let me wait a bit then [16:16:50] earlier i was using the redirects=True parameter to session.get so i'm not sure whats going on here... [16:17:50] That’s a thing to discuss—I removed the redirects, cos I suspect that’s what we want [16:19:00] looks like i cannot login to ores-misc :( [16:19:24] ohno—I can fix that, one moment [16:19:51] oh wait np, it was just remote host ident, i had to remove it from known hosts [16:20:58] oh? I still don’t see you in the project members, maybe I’m in the wrong project... [16:21:02] must be. [16:22:14] awight: the text dataset contains 84000 entries, what do you think happened with the rest? [16:22:26] redirects :) [16:22:26] total were 93000 approx i think [16:22:44] awight: so we don't need to resolve redirects? [16:23:24] No, I’m thinking that if the first revision points to another article, we have no reason to think that the resolved content is an initial revision. Actually, the odds are really high that it’s not. [16:24:14] & if an initial revision is a redirect, we’re not interested in “mentoring” the author, cos it’s so simple [16:29:28] ok [16:34:05] these are totally guesses, of course. We can come back and fix my weird decisions later... [16:41:09] awight: a though, if the query returns the initial revision id, we can augment the dataset with it, because it'll be very helpful while manual analysis, otherwise getting to the initial version from the page title is long [16:41:12] *thought [16:41:53] okay i see labeling['rev_id'] = rev_doc['revid'] is already doing that :) [16:43:44] :) [17:02:04] awight https://groups.google.com/forum/#!topic/repo-discuss/gX3tQtNLsTY :) [17:08:04] paladox: Looks like you have good ideas, thanks! [17:08:11] heh :) [17:08:27] awight they are planning on re doing the meta data change screen [17:08:37] https://groups.google.com/forum/#!topic/repo-discuss/FUQuCzYwkv4 [19:04:33] awight: i'm facing the same problem with model training, it looks like the new dataset generated has some issues [19:04:42] maybe some observations are not right [19:05:00] cv_train is working fine on the old one [19:05:00] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4183001 (10awight) Here's the original stack trace, ``` Traceback (most recent call last): File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker re... [19:05:25] +1 I think you’re right [19:06:01] codezee: ^ that last comment has the original stack trace, which is more interesting than the one I had last night [19:07:42] its failing in validating X, y [19:08:30] what… is that [19:09:40] probably the shape of X of some observation is not right [19:12:43] I’m putting together a script to check “cache” values... [19:14:19] starting by looking for empty ones [19:15:20] nothing empty [19:15:35] hmm, I’ll pare down to one observation [19:18:05] I get revscoring.errors.ModelConsistencyError: ModelConsistencyError: Expected labels {'STEM.Mathematics', 'History_And_Society.Education', 'STEM.Space', 'STEM.Chemistry', 'Culture.Sports', 'Assistance.Article improvement and grading', 'STEM.Time', 'STEM.Technology', 'Geography.Oceania', 'Assistance.Contents systems', 'Geography.Europe', 'STEM.Science', 'Culture.Philosophy and religion', 'Culture.Food and drink', 'Geography.Countries', 'Assistance.Fil [19:18:06] 'Geography.Maps', 'History_And_Society.Transportation', 'History_And_Society.Military and warfare', 'STEM.Information science', 'Geography.Bodies of water', 'STEM.Physics', 'STEM.Meteorology', 'Culture.Entertainment', 'Culture.Internet culture', 'Assistance.Maintenance', 'STEM.Engineering', 'Culture.Broadcasting', 'Culture.Arts', 'History_And_Society.Business and economics', 'Culture.Language and literature', 'Geography.Landforms', 'Culture.Media', [19:18:07] 'Culture.Plastic arts', 'STEM.Medicine', 'Geography.Cities', 'STEM.Biology', 'Culture.Crafts and hobbies', 'History_And_Society.Politics and government', 'History_And_Society.History and society', 'STEM.Geosciences', 'Culture.Visual arts'} represented [19:18:20] I guess that’s about labels-config? [19:20:05] ugh [19:22:14] Cool. I was able to train with head -2 [19:22:24] So yeah, there’s a rogue observation in there [19:22:27] !! [19:22:38] awight: yes [19:23:03] awight: model consistency is when labels-config says something and actually those labels are not present or extra present [19:23:22] awight: whats with the rogue observation? [19:23:24] I took care of that by stripping down to 2 observations, and 2 labels [19:23:26] I donno! [19:23:38] I’m back on writing my incredibly scrappy validator [19:23:53] word2vec vector is the prime suspect, IMO [19:24:06] e.g. maybe it barfs on empty text [19:26:25] awight: but is_article checks for len(text) >50, so it shouldn't be empty [19:26:44] I… could have broken it :) [19:30:35] grr no empties [19:30:42] how to debug... [19:30:51] we could bisect :( [19:32:00] trying the first 1,000 just to see if we can narrow it down a bit [19:32:10] grr modelconsistenty annoyingness [19:34:19] ok this is not a fun way to debug [19:34:35] mebbe I will instrument the code that fails [19:54:39] codezee: When I instrument it, I see an absolutely massive array.. [19:57:08] fwiw, dtype is numpy.float32 [19:59:17] awight: where exactly are you seeing the massive array? it must be a 300dim vector [19:59:37] I’m looking at this line, array = np.array(array, dtype=dtype, order=order, copy=copy) [19:59:41] and logging array & dtype [19:59:44] array is nuts [19:59:55] ah—logging only when the exception is thrown [20:00:20] awight: so there's an empty array there? [20:00:35] no, I’m about to dump to a file so I can see it better [20:00:50] 10Scoring-platform-team, 10ORES: arwiki goodfaith model is not usable - https://phabricator.wikimedia.org/T193905#4183073 (10Catrope) [20:03:04] whoa [20:03:17] okay, the input “array” value is in ores-misc:/tmp/o [20:03:26] it’s… 756MB of json [20:04:52] 10Scoring-platform-team, 10ORES: arwiki goodfaith model is not usable - https://phabricator.wikimedia.org/T193905#4183118 (10Catrope) For context, when setting thresholds for use in RCFilters, we typically look for: - Maybe good faith: 90% recall or 15% precision, whichever is better - Likely good faith: 60% p... [20:09:42] awight: i think i got it, word vectors is emitting [[0]] when its an empty string [20:09:57] it should be [[0.0....300 times]] [20:09:58] 10[1] 04https://meta.wikimedia.org/wiki/0.0....300_times [20:15:21] wiki-ai/revscoring#1474 (nullvector - 42d4e15 : Sumit Asthana): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/375058667 [20:15:36] codezee: Awesome, good find! [20:15:44] So I broke is_article... [20:16:17] +1 there it is, [ [20:16:18] 0.0 [20:16:19] ], [20:17:37] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4183148 (10Sumit) Looks like an issue with [[0]] being returned on an empty string '' by wordvectors instead of the usual null vector of dimensions (300,) [20:18:37] awight: you have not changed is_article so i'm not sure why it can break... [20:19:01] Really? I remember tampering somewhere nearby. [20:19:27] I have a few example lines from w_cache now: [20:20:11] argh, formatting [20:21:20] https://phabricator.wikimedia.org/P7085 [20:21:50] here’s an example, https://en.wikipedia.org/?diff=58621232 [20:22:15] They seem to all be links [20:22:28] here’s a longer one, https://en.wikipedia.org/?diff=710710777 [20:24:57] I guess the reason it’s breaking now is that the latest revisions of these articles actually have content. [20:27:06] +1 for the fix you suggested, having word2vector emit the null vector instead. [20:27:15] ah [20:27:25] let’s just filter out these rows manually? [20:27:50] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4183174 (10Sumit) https://github.com/wiki-ai/revscoring/pull/398 [20:28:32] awight: it'll be easier to rebuild with the fix i think, i'm already running extract again [20:28:45] right on! [20:29:16] just to save yourself some time, maybe test on a single row [20:29:58] awight: i have written a test in the patch itself so word vectors side should be fine and it should work if that was the only problem [20:30:13] awight: whole pipeline would not work anyway with single row :( [20:30:23] nah just the extraction [20:30:25] kk [20:31:14] So links and numbers get stripped out, btw? [20:32:22] Feel like merging https://github.com/wiki-ai/drafttopic/pull/23 ? I think it’s good to go [20:32:22] awight: is_article seems fine, because we're using english.non_stop words, so if there's only template data on the article non_stop words would come out as 0 [20:32:48] awight: yes, i'll do that once extraction is finished, [20:32:51] the revscoring patch looked good to me, it’s merged now [20:33:10] We’ll need to push a new version to pypi, I can do that [20:33:26] mebbe after hearing whether it smoke tests for you [20:41:26] wiki-ai/revscoring#1477 (2.2.3 - 46ef5fd : Adam Wight): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/375070983 [20:43:11] wiki-ai/revscoring#1477 (2.2.3 - 46ef5fd : Adam Wight): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/375070983 [20:45:27] awight: i think it'll be good to have an option to specify whether the latest or the oldest revision is required [20:47:57] If you want… [20:48:33] There’s probably interesting stuff you can do with analyzing each revision to determine the earliest average time when enough signal exists. [20:51:40] awight: yeah i was thinking of that sth like measuring the length of article, but that would require several requests for each revision even on a single article [20:52:10] I was imagining we could find some cutoff like 1hr [20:53:38] awight: yeah, although trying with diff cutoff times would be resource intensive experiment but it would make an interesting study on how drafts have evolved [20:55:28] we might be able to get a cheap approximation by taking one of our trained models and graphing confidence for each edit to the draft [20:56:54] the model is training for now. I'll not be surprised if the measures are very low, since many drafts were one-liners [20:57:19] revscoring 2.2.3 is uploaded [20:57:31] nice work! o/ [20:57:42] “o/“ as in, “high five” :) [20:57:42] awight: also, i forgot that we will need to revise our target labels according to the version of the article we're taking [20:57:53] actually... [20:57:55] maybe not? [20:58:21] cos the labels on the newest revision of the talk page are presumably the most accurate [20:58:59] but the earliest draft must not be having that content at all, someone added to it, and later it got tagged with that topic [20:59:28] I think we can assume that the topic of a page never changes, though [20:59:34] okay [21:00:02] not convinced fully but it can rest for now [21:00:03] like, the hehe Platonic ideal of the page’s concept [21:00:17] also, there wouldn’t be a talk page for the first revision of an article [21:00:32] yes [21:04:11] Off-topic, I learned from Aaron that extraction has a secret mode where you can feed it an old w_cache file and it’ll only extract what’s missing... [21:04:25] too late for us, but I’d like to make that more usable in the future [21:12:15] btw, I’m also running extraction and training on ores-misc, for repeatability and cos that’s an environment that can make production models [21:12:28] so don’t feel like you have to stay up late for the marathon ;-) [21:16:48] awight: so much for the secret mode :P maybe we should have treasure hunts for revscoring jackpots :D [21:17:23] awight: i'm about to go, i'm just trying to get some predictions on initial versions of drafts with the model trained on latest versions [21:21:12] Awesome [21:21:25] oh, I misread that. Still awesome :) [21:22:52] I bet we could remove some feature like content_length to make it happier… however that works [21:29:16] Verified that we no longer have [[0]] vectors. Nice fix! [22:13:28] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Support git-lfs - https://phabricator.wikimedia.org/T180627#4183333 (10awight) FWIW, this means we're waiting for scap 3.8.1 (`scap version`) to land on tin. [22:18:46] 10Scoring-platform-team (Current), 10ORES, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Deploy ORES advanced editquality models to arwiki - https://phabricator.wikimedia.org/T192498#4140848 (10Catrope) I looked at the generated models for arwiki and the goodfaith one is not good enough to use on t... [22:49:40] 10Scoring-platform-team (Current), 10ORES, 10Collaboration-Team-Triage (Collab-Team-This-Quarter), 10Patch-For-Review: Deploy ORES advanced editquality models to cawiki - https://phabricator.wikimedia.org/T192501#4140882 (10Catrope) Note that the goodfaith model for cawiki isn't super helpful. There's a qu... [23:00:09] 10Scoring-platform-team (Current), 10ORES, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Deploy ORES advanced editquality models to lvwiki - https://phabricator.wikimedia.org/T192499#4140858 (10Catrope) When setting thresholds for this model I had qualms similar to the ones I had for cawiki: T19250... [23:06:14] 10Scoring-platform-team (Current), 10drafttopic-modeling: Rebuild drafttopic with corrected data - https://phabricator.wikimedia.org/T193834#4183448 (10awight) The new model did great against a dozen spot-checked initial drafts not in the training set. This is extremely rad. [23:12:18] 10Scoring-platform-team (Current), 10ORES, 10drafttopic-modeling: ORES filter to only score page initial revisions using drafttopic - https://phabricator.wikimedia.org/T193914#4183455 (10awight)