[09:29:39] 10Scoring-platform-team (Current), 10JADE: [discuss] JADE schema format (endorsements?) - https://phabricator.wikimedia.org/T193643#4189716 (10awight) After IRC discussion, I'm on board with the more complex schema. Importantly, it can be used as a superset of the simpler workflow's schema, so it gives us the... [13:59:14] o/ [14:07:34] Back to work on presenting :) [14:10:18] Amir1, will you be around at 1830 UTC? [14:10:39] I want to highlight some of the work you did with false positive reports in the wikidata damage detection models at the research showcase. [14:24:00] halfak: yeah I will be there \o/ [14:24:06] Cool. :) [14:24:11] just drop me a message and calendar invite [14:38:40] Invite incoming. [16:13:13] Amir1 or halfak: this is done, right? https://phabricator.wikimedia.org/T174384 [16:13:19] how did it affect accuracy? [16:13:41] I think it is pending deployment. [16:13:53] We changed the way that we generated stats at this time, so it's hard to say. [16:14:05] I think we'd have to generate a model with and without it to check. [16:14:11] I see that new feature in the features list [16:14:21] Oh! It has been deployed? [16:14:33] https://ores.wikimedia.org/v2/scores/enwiki/wp10/123456/?features [16:14:41] Aha! Cool [16:14:46] Let's resolve that task :) [16:14:47] paragraphs_without_refs_total_length is the one, right? [16:15:06] Right [16:15:12] 10Scoring-platform-team (Current), 10articlequality-modeling, 10Easy, 10Google-Code-in-2017, and 2 others: Implement feature for detecting clumps of text that lack references - https://phabricator.wikimedia.org/T174384#4191146 (10Halfak) 05Open>03Resolved [16:38:32] Arg. I don't feel good about the introduction to this talk. [16:38:43] But it'll be OK. The real meat of it are the case studies. [16:39:06] My goal is to charge through the intro and spend most of the time discussing the cases. [17:44:13] halfak: I have looked through the list of active participants in the Red Women project, and created a list of potential participants (random selection from all members). If you have a minute, could you take a look and let me know if anyone jumps out at you as not a person to ask? list: https://docs.google.com/document/d/1nzvcpoLvkD-Dsbda32nHz7C3_rPnu-fv2pun1ZSiiHk/edit?usp=sharing [17:44:46] * halfak requests access [17:46:22] Whoops, sorry! You should have access now, lmk if it's still not letting you in. [18:06:30] no concerns [18:06:47] BTW, "I JethroBT" is staff, but a very good person to talk to who I'm sure is continuing work in a volunteer capacity. [18:07:21] ewhit_, ^ [18:08:12] ok, great thank you! I need to test out BlueJeans recording and meeting with people without a umich BlueJeans account, and then I think we're ready to start recruiting (we have IRB approval!) [18:17:06] (03PS1) 10Umherirrender: Use Status::wrap to format a status object [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 [18:19:03] o/ Amir1 [18:19:09] Just dropped you an invite to the hangout. [18:19:16] (03PS2) 10Umherirrender: Use Status::wrap to format a status object [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 [18:19:53] (03CR) 10Umherirrender: "Patch Set 2: Added missing use" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 (owner: 10Umherirrender) [18:38:16] halfak: I'm attending at the youtube stream [18:56:32] srrodlund: Just checking, we're skipping the docs meeting today? [18:56:52] Yeah. I am at Write the Docs [18:56:59] so I can't make it [18:57:13] I think our next meeting will be in person next week :-) [18:58:27] srrodlund: wfm, looking forward to it! [19:02:03] Meeeee tooooo! [19:11:04] o/ Platonides [19:11:28] I'm sorry that I forgot to call you out specifically, but I suspect there will be questions :) [19:12:24] no problem [19:12:42] not that I was much involved there actually [19:39:01] halfak: I had an obvious insight yesterday, that the JADE schemas can be hosted on-wiki for maximum dogfooding. [19:39:21] OH like in meta's Schema namespace? [19:39:42] Wondering if we should put these on meta:Schema: yeah or in :JADE:Schema/ [19:40:07] Seems like many schemas are cross-wiki [19:40:14] so I think meta would make more sense. [19:41:11] It's a great first step--the only thing making me consider per-wiki is that users might want to * translate or * customize [19:41:18] which... could be a disaster, though an interesting one. [19:42:20] Good point. [19:42:27] Hmm.. For now, let's keep it in the repo [19:42:29] anyway, we're hosting a single copy in GitHub at the moment, so metawiki would be a small step forward [19:42:32] oh? [19:42:33] But I like where your head is at [19:42:34] hmm ok that's fine too. [19:42:36] hehe [19:43:25] Sanity check--I'm inclined to prioritize LFS and JADE above anything else for what's left of my week [19:43:42] cos we want that stuff for the hackathon. [19:44:15] +1 [19:44:32] More so draft quality because I think there's a lot we can do with JADE with what we have and what Prateek is working on [19:48:39] TIme for a break. Back in a bit. [19:50:45] draft topic or quality? [19:51:06] assuming drafttopic until I hear more [20:02:32] *topic! [20:02:45] :thumb: [20:02:49] docs meeting happening? [20:07:20] halfak: no, sorry we didn't relay that fact [20:07:32] happy to chat just us, if you want? [20:07:35] Gotcha. I'll go back to me ~lunch break then :) [20:07:39] Na. It's cool. [20:07:48] I do want to work on a docs task for the hackathon at our offsite. [20:07:59] kk yeah me too [20:08:04] I was thinking that a we could gather materials with your outline and call that a task. [20:08:14] Good place to start, IMO [20:08:17] E.g. here's a set of pages and some papers we wrote and an outline we want :) [20:08:30] OK AFK again :) [20:08:50] I think the plots are looking exciting, too. I'm doing a quick rewrite of thresholds_diagrams using matplot rather than bokeh [20:23:41] RoanKattouw: fyi, the notebook figures are correctly embedded now, so it should be easier to experiment with: https://github.com/adamwight/thresholds_diagrams/blob/master/Thresholds%20diagrams.ipynb [20:28:56] ^ just pointing to a more usable IRC log [20:34:51] YES. git-lfs should be deployable today [20:43:37] paladox: Do you feel like adding another gerrit repo? [20:43:52] awight yep i can though wont be able to do the github mirror [20:43:59] ah interesting [20:44:13] yah it's supposed to be a github mirror... [20:44:18] oh [20:44:30] awight no_justification should be able to do the mirror [20:45:33] kk well if you find the time, the new repo should be https://gerrit.wikimedia.org/r/scoring/ores/drafttopic [20:45:56] It will eventually mirror http://github.com/wiki-ai/drafttopic [20:47:33] ok [20:49:11] awight do i do --empty-commit? [20:50:08] paladox: Hmm, good question--I'm not sure how existing history will interact with the mirrored repo [20:50:18] ok [20:50:23] No commits seems ideal, but I donno if that's allowed [20:50:48] ok wont do --empty-commit [20:53:38] awight repo exists already https://gerrit.wikimedia.org/r/#/admin/projects/scoring/ores/drafttopic [20:54:31] * awight facepalms [20:54:34] sorry! [20:56:06] heh [20:56:27] thank you for service already performed ;-) [20:57:13] lol [21:00:36] 10Scoring-platform-team, 10Release-Engineering-Team: Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212#4192258 (10awight) [21:01:09] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431935 (owner: 10L10n-bot) [21:04:19] 10Scoring-platform-team, 10Release-Engineering-Team: Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212#4192286 (10awight) [21:04:22] 10Scoring-platform-team, 10Release-Engineering-Team: Another round of discussion about wiki-ai's GitHub->gerrit mirroring - https://phabricator.wikimedia.org/T194212#4192258 (10awight) [21:05:14] paladox: Are you able to create a Diffusion repo? [21:05:20] awight yep [21:05:35] hehe someone accidentally gave me perms too, it seems [21:05:40] * awight loosens bolts [21:05:58] lol [21:17:41] o/ [21:18:02] * halfak looks at current work list. [21:18:15] awight, need any review for drafttopic stuff? [21:18:47] halfak: I'm doing infrastructure things at the moment, so no. But there's an implementation thing you could look at if you want: [21:19:02] Cool [21:19:10] We should only score drafttopic when it's an initial revision. [21:19:15] * halfak is dying to do some technical work :D [21:19:23] for a moment, I thought we could do that on the changeprop side, but that's probably not correct. [21:19:34] Instead, I'm thinking we need a new filter for precache: config [21:19:35] Aha! Yeah, let me look at that. [21:20:00] something like, "page_create" meaning, if mediawiki.revision-create.parent_id == null [21:20:05] kk thanks! [21:20:27] I'm thinking that we want to score all versions of pages in the draft namespace, but only the first edit to main namespace articles. [21:20:29] hmm [21:20:35] I'll do the simple thing first. [21:20:35] ooh neat [21:20:36] hehe [21:20:44] +1 [21:25:04] ooh! We already have a "page_creation" event. [21:25:12] in ORES or in changeprop? [21:25:14] if event['rev_parent_id'] == 0: [21:25:16] In ORES [21:25:21] :100%: [21:25:31] hehe short work of that, then [21:25:51] I can write the config, unless that sounds borderline fun [21:26:13] I think there's nothing to do except to have precache configured to '"on": ['page_creation']' [21:26:24] OK another technical thing that needs to happen is https://phabricator.wikimedia.org/T192293 [21:26:48] got it. Current model is good now, so this should be quick work. [21:27:08] n.b. revscoring 2.2.3 [21:27:18] I'll update the wheels now [21:27:23] n.b.? [21:27:46] nota bene or something [21:28:02] "don't get caught with your revscoring down a patch level" [21:28:03] * halfak is still confused. [21:28:20] U have to pip install new revscoring to the virtualenv is all [21:34:24] Oh of course :) [21:34:53] * halfak downloads word2vec binary :| [21:34:56] lol [21:34:58] Forgot about that [21:35:01] welcome [21:35:06] to LFS [21:45:31] The feature list loads "GoogleNews-vectors-negative300.bin" [21:45:37] But it seems the file has a ".gz" [21:45:48] So it doesn't work. what's the right fix here? [21:45:57] Extract the file or fix the feature_list [21:45:58] ? [21:47:26] 10Scoring-platform-team (Current), 10JADE: Implement multi-judgment and endorsement schema for Extension:JADE - https://phabricator.wikimedia.org/T194219#4192481 (10awight) [21:48:46] 10Scoring-platform-team: Host JADE schemas on metawiki - https://phabricator.wikimedia.org/T194220#4192493 (10awight) [21:48:59] 10Scoring-platform-team, 10JADE: Host JADE schemas on metawiki - https://phabricator.wikimedia.org/T194220#4192503 (10awight) [21:50:32] Yeah... can't get the model to load at all. [21:50:43] I tried just extracting the .bin [21:50:52] awight, ^ [21:51:51] (03PS1) 10Awight: [WIP] Provision the drafttopic model [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/432000 (https://phabricator.wikimedia.org/T176336) [21:52:09] halfak: lemme check the repo [21:52:29] I tried all potential locations in the search path [21:52:37] halfak: You have an old commit? [21:52:46] word2vec/, ~/.word2vec and /var/share/word2vec [21:52:50] * halfak double-checks. [21:52:55] also, you'll need to mkdir ~/.work2vec and symlink the .bin.gz into that [21:53:08] right. Tried that. [21:53:11] Production has a similar (but different) link, of course [21:53:18] Confirmed I'm on the most recent commit. [21:53:23] commit de6a1fe583a5932c622dd3aacb4a1babbb35112f [21:53:24] ? [21:53:36] yup [21:53:39] hmmm [21:53:40] The feature list definitely looks for .bin.gz [21:53:55] drafttopic/feature_lists/wordvectors.py [21:53:59] filename="GoogleNews-vectors-negative300.bin.gz", limit=150000) [21:54:13] ahh. [21:54:20] Look at feature_lists/word2vec.py [21:54:21] I don't think the pickled model is able to hijack that [21:54:28] wtf that file shouldn't exist [21:54:32] yes [21:54:38] does it? [21:54:43] Also, this should be named enwiki. Who came up with this? [21:54:59] ahem [21:55:11] we can tweak that later, IMO [21:55:13] Weird.... I wonder why it got pulled in. [21:55:16] Agreed. [21:55:39] The previous model does look for feature_lists/word2vec [21:55:45] but not the commit you're on [21:55:47] Oh! I extracted it before I moved things to the right place. [21:55:51] * halfak re-compresses. [21:55:55] O_o [21:55:58] hehe [21:56:22] * awight recommends using the git-lfs assets repo [21:56:34] https://gerrit.wikimedia.org/r/scoring/ores/assets.git [21:57:21] * halfak waits for several gigs to recompress [21:57:26] If you check that out on the git-lfs branch in ores-deploy, as a submodule, you'll only be downloadin gone more time :) [21:57:36] s/ g/g / [21:57:50] no big deal though [21:58:00] * awight speculates in coal stock [22:00:39] Well. Somehow I'm out of time and I have achieved nothing >:( [22:01:02] I got it [22:01:11] I'ma do some chores and come back to this in a few minutes. [22:01:14] ah ok [22:01:30] Unrelated, I booked a rental bike for Barcelona :) [22:02:03] very nice. I've been daydreaming about potential loaners in AMS with kid capabilities [22:08:05] halfak: FYI I'm getting the full ores config running locally with drafttopic, so I can take on that half of the memory checking [22:09:39] Somehow this file is bigger than it was before I decompressed it. [22:09:43] Same compression strategy and level [22:09:47] :| [22:09:59] I should have just downloaded it again ^_^ [22:10:39] IT"S ALIVE [22:10:51] halfak: grab the assets repo for a good time [22:12:31] \o/ I has drafttopic running under ores [22:13:39] Hmm, is the precache filter only applied to precache stuff, so we'll end up running drafttopic on any externally requested revision? [22:16:58] 10Scoring-platform-team (Current), 10ORES, 10drafttopic-modeling, 10artificial-intelligence: Check drafttopic model memory usage - https://phabricator.wikimedia.org/T192293#4192590 (10Halfak) Simple check: Not loaded ``` halfak 5029 7.0 0.8 480916 69776 pts/1 Sl+ 17:13 0:01 python ``` Loaded:... [22:17:06] awight, right [22:17:18] awight, just finished posting results of the "simple check" [22:17:24] ty! [22:17:29] The in-context ORES check remains. Seems like you are ready for it. [22:17:34] I don't see how model version is set... [22:17:37] +1 I can do that [22:17:48] In the makefile of the relevant repo [22:17:55] It's a config param of cv_train [22:17:55] d'oh. [22:18:10] We... have to retrain to set a model version, I'm afraid. [22:18:33] awight, one option is to just load the model, set the new version, and then re-dump the model. [22:18:41] But generally, it's a good idea to re-train. [22:18:55] soo much time [22:18:57] I'm out of here though. I'll leave the call up to you. [22:18:59] :) [22:19:13] kk I have to go soon too, but I can smell the carrot [22:19:16] I've manually changed versions in the past. [22:19:28] hehe thanks for the license to ill [22:30:22] 10Scoring-platform-team, 10ORES: Sort classes when printing model_info - https://phabricator.wikimedia.org/T194221#4192613 (10awight) [22:37:46] 10Scoring-platform-team, 10ORES, 10drafttopic-modeling: ORES: new type of filter to select when a given model will be included in scores - https://phabricator.wikimedia.org/T194222#4192638 (10awight) [22:40:49] halfak: https://github.com/wiki-ai/drafttopic/pull/24 at some point [22:57:52] 10Scoring-platform-team (Current), 10ORES, 10drafttopic-modeling, 10artificial-intelligence: Check drafttopic model memory usage - https://phabricator.wikimedia.org/T192293#4192760 (10awight) Slightly different results when running under ORES, it seems that performing drafttopic scoring does have a one-tim... [23:02:22] ewhit_: Where are you getting wikichatter from? Seems that it's not in pypi? [23:05:50] got it [23:17:15] awight: It's here https://github.com/mediawiki-utilities/python-mwchatter [23:20:20] Thanks, I dumped that repo URL into a file but can't do any more at the moment. See ya! [23:20:24] https://github.com/ewhit51/talkpage_scraper/pull/1 [23:21:27] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Support git-lfs - https://phabricator.wikimedia.org/T180627#4192775 (10awight) 15:50 < awight> twentyafterfour: bad news, my test LFS deployment failed to pull the files again 15:51 < awight> tin:/srv/deployment/ores/... [23:22:04] o/ [23:53:50] (03CR) 10Legoktm: [C: 032] Use Status::wrap to format a status object [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 (owner: 10Umherirrender) [23:56:26] (03Merged) 10jenkins-bot: Use Status::wrap to format a status object [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 (owner: 10Umherirrender) [23:57:33] (03CR) 10jenkins-bot: Use Status::wrap to format a status object [extensions/ORES] - 10https://gerrit.wikimedia.org/r/431811 (owner: 10Umherirrender)