[00:07:35] (03CR) 10Krinkle: "For the record, doesn't really depend on core's change, but until we run npm-install in quibble, it'll accidentally work because core's no" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/430515 (https://phabricator.wikimedia.org/T193088) (owner: 10Krinkle) [13:38:55] o/ [13:39:00] I'm back. Feeling good today. [13:46:03] 10Scoring-platform-team (Current), 10JADE: [discuss] JADE schema format (endorsements?) - https://phabricator.wikimedia.org/T193643#4178286 (10Halfak) @awight ^ [14:16:22] Someone's trying to log into my Wikipedia account :) [14:16:33] Little do they know I have a strong PW and 2factor. [15:39:58] o/ [15:43:11] halfak: fyi, we dropped this yesterday, hopefully it helps with quarterly stuff: https://phabricator.wikimedia.org/phame/post/view/104/status_update_may_2_2018/ [15:56:30] I’m not sure what to do with drafttopic, if left to my devices I’ll probably go ahead with rewriting the PAWS portion of the pipeline with API calls. [15:57:02] But codezee is recreating the steps he took to produce the initial training data… so we’ll probably get some nice merge conflicts. [16:21:32] awight, that was great. It helped a lot :) [16:22:04] awight, I just got through QR stuff and some program committee work, so I can come back to Drafttopic. [16:22:16] whew, sounds fun though! [16:22:59] At this point, there are two semi-licked cookies in drafttopic [16:26:37] halfak: In other semi-important news, my Wikimania presentation was rejected [16:26:45] * awight buys a lottery ticket [16:27:17] boo. [16:27:49] Mine got accepted but there was some backchannel discussion and that was the only reason. [16:28:01] Pretty crazy that the reviewers thought JADE wasn’t related to the theme of “bridging the knowledge gap”. [16:28:09] I mentioned our team in the proposal and it seems that reviewers are allergic to WMF Staff. [16:28:19] oh good [16:28:22] They thought I was going to "just give yet another WMF software update" [16:39:02] halfak: Hey! You planning on coming to the research group meeting? [16:43:41] Oh! yes. There's an allstaff meeting that is overlapping :| [16:43:48] Forgot to adjust the meeting time. [16:44:03] We usually just push the RG meeting back 30 minutes when this happens. [16:44:06] ewhit_, ^ [16:44:19] ah ok. Should I let everyone know what's happening? Or go ahead and start? [16:46:53] 10Scoring-platform-team, 10Operations, 10Release-Engineering-Team, 10Scap: Deployment git server can't supply ORES hosts in parallel - https://phabricator.wikimedia.org/T191842#4178996 (10RobH) p:05Triage>03Normal As part of SRE clinic duty, I'm reviewing all unassigned, needs triage tasks in #operatio... [17:22:34] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#4179165 (10Milimetric) p:05Triage>03Normal [18:19:09] halfak: Is there already a utility for making batch MW API queries and integrating the results back into a list of observations, by page id or title? [18:21:46] mwapi has a continuation functionality [18:22:18] https://pythonhosted.org/mwapi/session.html#mwapi.Session.continuation [18:22:40] awight: if you use continuation, it will give you a generator object, which looks like it might do what you want [18:22:49] It doesn't just stitch results together (because that is more complicated than you think) but it will handle continuation for you and just iterate through responses. [18:22:54] awight, ^ [18:23:07] continuation is handled invisbly by mwapi, I did notice [18:23:28] https://www.mediawiki.org/wiki/API:Query#Generators [18:24:14] The part I was hoping to find a prefab silver bullet for is when I have a list of dicts, e.g. {‘page_id’: 123, ‘page_title’: ‘Funhouse’} and I want to run a MW API to get more info about each row. [18:25:10] I have to create a map from page_id to observation, run the API using parellel threads, batching and whatever, then use my map to integrate the new info back into the observations. [18:25:20] I see that machinery rewritten in a few places... [18:25:34] I’ll make a new abstraction, if one doesn’t exist already. [18:26:08] On the bright side, I’ve got all the required API calls working together now and the last thing to do is just this chunking and parallelization finesse [18:26:23] Looks like we’ll be able to build the drafttopic datasets w/o PAWS [18:34:30] Here’s an example output, {"talk_page_title": "Talk:Elizabeth Almira Allen", "talk_page_id": 56807021, "templates": ["Template:WikiProject Articles for creation", "Template:WikiProject Articles for creation/class", "Template:WikiProject Biography", "Template:WikiProject Biography/class", "Template:WikiProject Education in New Jersey", "Template:WikiProject United States", "Template:WikiProject United States/class", "Template:WikiProject Women's [18:34:30] History", "Template:WikiProject banner shell"], "mid_level_categories": ["Culture.Language and literature", "Assistance.Maintenance", "History_And_Society.Education", "Geography.Countries"], "title": "Elizabeth Almira Allen", "rev_id": 832513008} [18:34:55] oops, that’s the wrong rev_id [18:35:10] 818448887 [18:37:30] halfak: K i have to run off until c. 3pm Pacific, but the “followup” branch now has working code to pull our data. The last step is to chunk and parallelize. [18:38:09] I’m kinda looking forward to it, but also won’t cry if you get bored and have some reusable stuff to throw at the problem. [18:38:11] o/ [18:40:18] awight: if you can get batch queries to work it'll be great! I had given batch queries a shot but ran into problems with page redirects during result aggregation, so I reverted back to using one request per instance in fetch-text api [18:41:29] oh I think for the use case of paws queries, batch requests shouldn't create a problem [18:43:07] hey! [18:43:27] Sorry I’m running off for a bit, but thanks for all the info! [18:47:15] ok,np [18:51:05] halfak: rev_id gives the snapshot of the page at a particular edit right? [18:51:10] *after [18:55:19] meetings meetings meetings! [18:55:28] * halfak reads scrollback [18:57:17] codezee, right re. rev_id [18:57:33] When you look up the content at rev_id, that's the version of the page after the target edit [18:57:53] When you look up something based on diff=, you get the change vs. the last version (parent) [19:01:10] codezee, any thoughts on awight's progress and not needing to use PAWS? [19:01:22] I'm not sure how the random sampling can work without PAWS. [19:15:08] halfak: if the number of articles fetched for each wikiproject is >> 2k then it should be sufficient randomization during sampling from those wikiprojects for each mid-level-cat. But this requires the overhead fetching huge number of pages for each WP most of which we'll be discarding [19:19:38] although i'm also not entirely clear what the sql query would do internally....so if i'm not wrong, if WIkiProject:Literature is much more common than WikiProject:FineArts, then literature articles should be proportionally more in our sample [19:26:32] 10Scoring-platform-team (Current): [Discuss] Random sampling by PAWS vs API requests - https://phabricator.wikimedia.org/T193789#4179717 (10Sumit) [19:56:48] codezee, I think that overhead is acceptable. I wonder how long the script will run though :| [19:57:24] re. fineArts being less common, right because the sampling strategy doesn't account for multi-label [19:57:58] This makes our likelihood estimates kind of sad because they can't be directly compared to one another. [19:58:18] Oh! You were talking about within a mid-level category. I'm not too worried about that. [21:27:21] halfak: [21:27:49] I have a few minutes between appoints, just wondering if drafttopic is still my cookie. [21:29:00] Still your cookie! [21:29:07] Been helping ewhit_ with some mwchatter stuff [21:29:25] cool, ty [21:30:43] awight: are you making another query to get rev_id of the main page? or is it the rev_id of the talk page? [21:31:06] I’m making a second query, lemme dig up a link [21:33:10] codezee: https://github.com/wiki-ai/drafttopic/pull/21/files#diff-6938fec85f333d2e18038084128b3261R88 [21:39:33] codezee: Do you have a sample of the data you used to trained the last model? [21:41:35] awight: yes i trained one just now, you can look at /home/codezee/ai/drafttopic/datasets/enwiki.labeled_wikiprojects.w_text.json [21:42:03] nice [21:42:03] on ores-misc-01 [21:42:17] sorry, ores-staging [21:42:21] awight: ^ [21:43:00] looking at the results i can say we've not messed up, its all in order [21:44:00] awesome [21:44:32] can you go ahead and check in the new model? I wanted to play with it but was blocked by some incompatibilities with new revscoring, and a renamed module in drafttopic. [21:46:20] new revscoring? i think its build using 2.0.1 so still might be incompatible , i used the requirements present in master [21:46:24] *built [21:47:12] darn, okay no worries [21:47:51] What’s the machine name for ores-staging? I haven’t been there in ages, I guess. [21:48:14] ores-staging-01 [21:48:21] :) [21:49:02] ls: cannot access '/home/codezee/ai/drafttopic/datasets/enwiki.labeled_wikiprojects.w_text.json': Permission denied [21:49:15] awight: `rvdir=newer` in session.get fetches the revision from the latest? [21:49:23] chmod g+rx ~ [21:49:26] if you don’t mind [21:49:51] codezee: oddly, it fetches them in ascending order, so the first (rvlimit=1) entry is the oldest [21:50:59] awight: its read for all, why does it deny you? [21:51:10] r-r-r [21:51:52] 10Scoring-platform-team (Current), 10JADE: [discuss] JADE schema format (endorsements?) - https://phabricator.wikimedia.org/T193643#4180310 (10awight) Why would the comments be encapsulated under our data structure, rather than at the edit comment level? I worry that mixing the free-form text and usernames in... [21:52:04] awight@ores-staging-01:~$ ls -ld /home/codezee/ [21:52:05] drwx------ 10 codezee wikidev 4096 May 3 21:49 /home/codezee/ [21:53:42] its readable now [21:53:58] in group, i see, so each parent dir needs to have a read permission [21:54:07] It needs to be rx [21:54:15] otherwise I can’t traverse to get to subdirs [21:54:23] done [21:55:20] weird, this blocks me cos I’m in the group, I think: [21:55:21] drwxr--r-x 2 codezee wikidev 4096 May 3 21:42 /home/codezee/ai/drafttopic/datasets/ [21:55:35] fixed that just now [21:55:45] can you access? [21:56:55] Thanks, I can read now [21:57:02] I don’t think this is the right content, unfortunately [21:57:06] Should be: https://en.wikipedia.org/w/index.php?title=Aka_Moon&oldid=9801177 [21:57:26] gotta run again, sorry! o/ [22:01:39] it again brings to the same point, why should we train using first revision, when the wikiproject mapping we're using is from the latest revision? [22:02:21] by taking the first revision we're taking content for which it doesn't even belong to certain wikiprojects, which although are included in target labels [22:03:33] for example, if we take first draft for training - https://en.wikipedia.org/w/index.php?title=Talk:Aka_Moon&oldid=79088314 says the draft belongs to WP:Biography, but our dataset will say WP:Jazz and WP:Biography [22:03:48] halfak: ^ [22:04:22] codezee, because we're trying to make predictions based on the first revision of new articles. [22:04:45] We can't test in a high-signal environment and claim that our model works for its intended purpose. [22:07:00] halfak: but during extraction of article given WikiProject, we have no way to know if the article had that wikiproject in its first revision, unless we actually look at the article and decide [22:07:19] WikiProject tags have nothing to do with the content of the article. [22:07:29] They have to do with the *topic* of the article. [22:07:32] The topic doesn't change [22:07:48] halfak: target labels are mid-level-categories, so those will change with different versions of articles [22:08:20] halfak: what if someone added some literature content to the article whose first draft primarily talked about warfare? [22:08:40] i'm talking about cases of articles having multiple topics [22:09:06] That's rare. And if the content was missing in the first version, that's just a hit we'll need to take. [22:09:18] Sometimes there's no clear signal for what you want to detect. [22:13:27] halfak: for example this is the first version of ann bishop - https://en.wikipedia.org/w/index.php?title=Ann_Bishop_(biologist)&oldid=518733419 [22:14:33] and this is the diff really adding *some* signal - https://en.wikipedia.org/w/index.php?title=Ann_Bishop_%28biologist%29&type=revision&diff=518838758&oldid=518807910 [22:14:46] which is somewhat 5-6 revisions ahead in the history [22:15:58] +1 I think that's a good example of why our modeling problem is difficult. [22:16:57] halfak: on the contrary what we can do to test is to have a test-set which explicitly contains draft articles in their initial revision and their actual wikiprojects at those draft versions [22:17:09] and see how our model performs on that, sort of as a gold standard [22:17:10] I'd be OK with that. [22:17:26] but each observation of that test-set will have to be manually verified we cannot automate that [22:17:48] because of these problems [22:19:12] or we can apply a simple heuristic to select the right initial version by checking something like words in the article above a threshold [22:20:04] and i might be wrong, but would most wikipedia drafts start out as stubs like above in their first versions? [22:20:08] *wouldn't [22:20:30] Why would we need to manually verify anything? [22:20:58] I'm OK with setting a minimum threshold of content-words. [22:21:03] halfak: because even in this test-set we need to make sure to not include examples like above [22:21:12] Na. I think we must have them [22:21:21] We can't determine precision and recall. [22:21:32] Some example are degenerate and we'll never get them right. That's OK. [22:21:37] It's part of the underlying problem. [22:21:44] Like, who do you send those drafts to? [22:23:29] i'll do some data fetching with adam's code to see how many of articles of these kind we get... [22:23:39] gotta go for now, its quite late [22:26:07] i'll get back on this...we might need to revise some things in the paper if we get a 2nd review [22:32:55] I'm running away too. [22:32:58] See ya folks! [22:41:31] So I'm working on deploying the new models (cawiki, lvwiki, arwiki and huwiki) to RC, and for arwiki in particular the models aren't good enough to be useful :( [22:41:58] arwiki damaging is not great but workable [22:42:01] arwiki goodfaith is unusable [22:43:37] There is no threshold that has 15% precision for bad faith, apparently [22:48:22] Or that's what the API claims when you ask it, but looking at the full threshold data it looks like a slightly different story [22:48:43] Which is that recall drops incredibly quickly [22:49:09] Oh no wait and precision does in fact never exceed 0.15 [22:50:56] It also goes up and down as the threshold increases which is super weird, isn't it supposed to be monotonic?