[03:30:42] 10Scoring-platform-team-Backlog, 10Africa-Wikimedia-Developers, 10Documentation, 10User-Zppix: Update all of documentation for ORES - https://phabricator.wikimedia.org/T174858#3578752 (10awight) @Zppix I don't understand the #africa-wikimedia-developers connection here... [03:30:59] 10Scoring-platform-team, 10ORES, 10Documentation, 10Epic: [Epic] Clean up ORES service documentation - https://phabricator.wikimedia.org/T148974#3578757 (10awight) [03:30:59] 10Scoring-platform-team-Backlog, 10Africa-Wikimedia-Developers, 10Documentation, 10User-Zppix: Update all of documentation for ORES - https://phabricator.wikimedia.org/T174858#3578754 (10awight) [10:04:10] PROBLEM - puppet on ores-worker-05 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:33:40] RECOVERY - puppet on ores-worker-05 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [14:45:40] o/ [14:45:44] Was away a lot this weekend. [14:45:51] Am back. [14:45:53] :) [15:15:04] 10Scoring-platform-team, 10Scoring-platform-team-Backlog, 10Research Ideas: Create machine-readable version of the WikiProject Directory - https://phabricator.wikimedia.org/T172326#3580317 (10Sumit) * Here's the parsed format snapshot up for review - https://gist.github.com/codez266/d69d01fb7809e3f76a7f59eac... [15:32:23] Halfak did you get a chance to read my email? [15:33:07] not yet! Sorry [15:33:26] Np [15:34:41] Awight im helping african wikimedia developers group find easy tasks to do within wikimedia, i was gaven explict permisson from d3r1ck one of the project creators to assist with this (talking about T174858) [15:34:41] T174858: Update all of documentation for ORES - https://phabricator.wikimedia.org/T174858 [15:38:52] 10Scoring-platform-team, 10ORES, 10Documentation, 10Epic: [Epic] Clean up ORES service documentation - https://phabricator.wikimedia.org/T148974#3580418 (10Zppix) @eugene233 feel free to use this task instead of T174858 for a guide of updating docs [16:02:28] Also halfak do you have a preference on whether i email you at gmail or wmf email? [16:02:48] no pref [16:02:54] Ok wasnt sure [16:06:02] 10Scoring-platform-team, 10Scoring-platform-team-Backlog, 10Research Ideas: Publish Machine-Readable WikiProjects Dataset - https://phabricator.wikimedia.org/T175037#3580511 (10Sumit) [16:21:10] I need to get to the office, with a Bart layover. This is gonna take 1.5hr maybe. [16:21:33] damn. godspeed [16:24:22] * Zppix gives awight a push [17:07:26] Done with meetings, but stepping away for food. [18:08:39] 10Scoring-platform-team-Backlog, 10ORES, 10editquality-modeling, 10artificial-intelligence: Deploy damaging/goodfaith model for svwiki - https://phabricator.wikimedia.org/T174558#3565785 (10jmatazzoni) @Halfak, when will ORES be enabled on this svwiki? Do you have a date? [18:14:12] back. [18:14:14] Sorry. [18:14:16] Forgot to say [18:16:32] 10Scoring-platform-team-Backlog, 10ORES, 10editquality-modeling, 10artificial-intelligence: Deploy damaging/goodfaith model for svwiki - https://phabricator.wikimedia.org/T174558#3581162 (10Halfak) We're planning on Sept. 20th. With this will come the changes discussed in https://phabricator.wikimedia.org... [18:17:54] 10Scoring-platform-team-Backlog, 10Collaboration-Team-Triage, 10ORES: Confirm that "thresholds" change will not affect RCFilters - https://phabricator.wikimedia.org/T175053#3581177 (10Halfak) [18:19:12] 10Scoring-platform-team-Backlog, 10ORES, 10editquality-modeling, 10artificial-intelligence: Deploy damaging/goodfaith model for svwiki - https://phabricator.wikimedia.org/T174558#3581191 (10Halfak) {T175053} [18:21:56] i recently upgraded my system as part of which gcc got upgraded and gfortran shared libs changed, I rebuilt scipy and I faced a very common issue - https://github.com/scipy/scipy/issues/5800 [18:22:01] 10Scoring-platform-team-Backlog, 10ORES: Some RC changes do not display highlight with ORES Quality and Intent filters - https://phabricator.wikimedia.org/T172840#3581199 (10Mattflaschen-WMF) [18:22:48] apparently I could only get it fixed when I installed latest scipy and numpy, those in requirements of revscoring seemed to be not fixing it [18:23:23] should we think about bumping versions of scipy and numpy in requirements of revscoring? [18:23:26] codezee, what did you upgrade to? [18:24:06] yeah with scipy and numpy you really should set up virtenvs once it's all set up [18:24:18] quite tempermental [18:25:13] halfak: I did a full system upgrade on arch, first error "libgfortran.so.3" not found, so looked up and found gcc5 provieded that, and I had gcc7 and libfortran.so.4, so I rebuilt scipy [18:25:18] and hence the above error [18:25:38] *libgfortran.so.4 [18:26:12] per comments it looks like and Arch specific error, but just mentioned in case we needed to update requirements [18:26:22] *an Arch specific error [18:26:25] Hmmm... I think doing a big upgrade of scipy, numpy, and sklearn would be a fine idea. [18:33:22] halfak: what would need to go in __init__.py to set up tests for fetch_wikiprojects in https://github.com/wiki-ai/drafttopic/tree/master/drafttopic/utilities ? [18:33:34] it seems i cannot directly import stuff in a tests/ subdirectory [18:34:23] it gives me this error - https://dpaste.de/FmU0 [18:35:01] You can. How are you running the test? [18:38:17] halfak: i have this structure - https://github.com/wiki-ai/drafttopic/tree/wikiprojects-parser-tests/drafttopic/utilities [18:39:16] and running the test from the utilities directory using python test/test_wikiprojects_.py [18:43:42] ok, got it running after executing nosetests in root [18:43:49] codezee, how are you running the tests? [18:43:51] oh [18:45:52] 10Scoring-platform-team-Backlog, 10Collaboration-Team-Triage, 10ORES, 10editquality-modeling, 10artificial-intelligence: Enable ORES filters for svwiki - https://phabricator.wikimedia.org/T174560#3581253 (10jmatazzoni) @Trizek-WMF, Rev Scoring is shooting for Sept. 20th to enable, according to Aaron. [18:46:23] * halfak looks more [19:16:36] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10MediaWiki-extensions-ORES, 10MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)): Hide ORES review letter from the change list legend. - https://phabricator.wikimedia.org/T172338#3581499 (10jmatazzoni) [20:19:18] halfak: Is there precedent for this thing you did? visibility : int (bitfield: user | comment | data | all) [20:19:46] Yeah. MW does it everywhere. [20:19:59] see. rev_deleted and log_deleted in revision and logging respectively. [20:20:14] https://www.mediawiki.org/wiki/Bitfields_for_rev_deleted [20:20:54] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3581887 (10bd808) [20:30:06] ragesoss: Hi! I’m wondering if you were able to provide admin suppression for Wiki Ed content, or whether you found a way around that? We’re looking at a similar use case for ORES, of hosting freeform text outside of the wikis. [20:34:34] awight: why host freeform text outside the wiki? [20:35:13] Platonides, we want to do things that MW can't do well [20:35:50] fix mediawiki ;) [20:35:55] Platonides: good question. Here’s an overview of the project https://www.mediawiki.org/wiki/JADE [20:36:21] The idea is that editors can provide scores using the same scales as ORES [20:36:28] e.g. “damaging: yes” [20:36:46] and then also comment, to justify their score or just for fun. [20:37:02] Platonides, srsly? Fix MediaWiki? bwahahaha! :P [20:37:09] xD [20:37:15] I’m considering a custom ContentHandler to give us all the MediaWiki things…. but that wouldn’t be ideal for queryability. [20:37:32] hmm why not? [20:38:02] Well, it’s fine as long as the data gets replicated to places off-wiki as well. [20:39:09] that discussion seems quite similar to eg. mediawiki talk pages [20:41:01] Definitely. I’m thinking about this as something like a generalization of talk pages. [20:41:17] So we’ll be able to have a talk page about a specific edit. Or a logged action. [20:52:06] But they'll need to be created ad-hoc [20:52:14] Just like talk pages [20:52:28] because we can't have 10 billion flow boards :S [20:56:29] halfak: It was a good exercise, a few more concerns came up while writing: https://github.com/adamwight/jade/blob/master/schema.sql [20:57:42] * schema migrations are going to require some machinery, it would be nice to reuse an existing framework for that. [20:58:21] * abuse suppressions will have to be made by wiki admins, so we need to either give them an onwiki interface that connects to JADE, or give them accounts on a JADE server and build the whole UI. [20:59:26] why does JADE need to be a separate system? [20:59:59] Platonides: Thanks for asking! It doesn’t necessarily, I’m definitely considering ContentHandler. See https://www.mediawiki.org/wiki/JADE/Implementations though [21:00:15] We’ll want to return the JADE judgments along with ORES scores [21:00:43] so it needs to be queryable from the ORES backend, and have low latency [21:00:47] doing everything separate will probably look simpler at the beginning [21:00:59] That can be a replica of the data, doesn’t have to be the authoritative master. [21:01:03] but when you take into account that you need to reimplement everything… [21:01:07] Exactly. [21:02:29] it could be that some JADE tables (those with actual scores) have a specialised db slave [21:03:18] months ago, there was a RFC for changing the tables [21:05:48] but I can't find it now :/ [21:06:55] Platonides: We were considering using EventBus to accomplish what I think you’re describing [21:08:04] EventBus? [21:08:08] I'm not familiar with that [21:09:49] [[mw:EventBus]] [21:09:50] 10[1] 10https://www.mediawiki.org/wiki/EventBus [21:09:56] 10Scoring-platform-team, 10Phabricator: Migrate #Scoring-platform-team to be a phab milestone - https://phabricator.wikimedia.org/T171513#3582135 (10Halfak) [21:11:19] it wasn't a RFC: https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 [21:15:24] so, you want to provide scores to revisions and pages [21:15:39] hmm [21:16:30] :D [21:19:27] Currently, just to state the obvious, ORES is providing those scores: https://ores.wikimedia.org/v3/scores/enwiki/12345678/damaging [21:19:43] So what we’re proposing is a way to provide human scoring and comments. [21:20:28] This might be a simple rebuttal of an automated score, or it could be as complex as a discussion and vote tally to get consensus that a change is indeed damaging. [21:24:02] Probably best to stick to the simple stuff at first, but we need to have room to grow. [21:27:03] And support for revert wars ;) [21:27:53] 10Scoring-platform-team, 10Phabricator: Migrate #Scoring-platform-team to be a phab milestone - https://phabricator.wikimedia.org/T171513#3582201 (10ksmith) @mmodell : Is this something you could take care of? [21:39:18] lmao [21:41:42] my bot really likes you awight [21:45:04] awight: I was trying to think to what should judgements be applied [21:45:14] synja: lol I kickbanned that thing so hard [21:46:08] Platonides: We’re thinking, in the first phase they can be applied to anything that ORES is scoring. If this turns out to be a constructive format for critiquing stuff, later phases might expand to point at more types of onwiki artifact. [21:46:10] apparently she doesn't remember it [21:46:15] alic1a, are you there? [21:46:16] synja: Hello. [21:46:22] shut up [21:46:23] synja: Absolutely. [21:46:25] aww [21:46:27] and which things does ORES score? [21:46:31] apparently she isn't still banned [21:46:33] synja: What is still banned ? [21:46:40] not you, now be quiet [21:47:10] Platonides: the editquality models score revisions, and the draftquality and wikiclass models score page content at a revision. [21:47:13] synja: please leave [21:47:30] Zppix, no please [21:47:31] revisions or diffs? [21:47:37] Zppix: I believe synja is the human :D [21:47:49] awight: oh i didnt realise it was a bot xD [21:48:00] says a lot about you.. [21:48:14] Platonides: ah editquality scores diffs [21:48:38] Usually one doesnt become rude to a bot and doesnt explain something behind it... [21:49:31] halfak: Platonides brings up an important point regarding the JADE schema: all of our models are actually scoring revisions, not pages. [21:49:45] fair enough [21:51:03] then I think you just want to attach to revisions [21:53:21] awight: i thought the exact same thing but didnt say anything because well i didnt think of it lol [21:54:21] I think that’s right. But the way we query wikiclass judgments will be by page_id, fwiw. [22:01:27] 10Scoring-platform-team, 10Phabricator, 10Release-Engineering-Team (Kanban): Migrate #Scoring-platform-team to be a phab milestone - https://phabricator.wikimedia.org/T171513#3582352 (10mmodell) a:03mmodell @ksmith: Indeed it is. [22:02:37] awight, right! Eventually we'll likely want to score pages/items/users/etc. [22:02:51] wikiclass judgements will be rev_id. [22:03:21] halfak: If we’re querying by page however… we’ll want to include the page ID in the schema. Which makes the general link table real messy. [22:03:48] na. gonna need to join [22:05:35] I see two alternatives. 1) ORES can only be queried by rev_id, so /enwiki/1234/wikiclass will only return JADE judgments matching that exact revision. [22:05:52] 2) What we probably want instead is that ORES can be queried using page_id [22:06:16] that would return the last N judgments which apply at the page level, even though they link to various revisions [22:06:21] harr [22:08:06] 10Scoring-platform-team, 10Phabricator, 10Release-Engineering-Team (Kanban): Migrate #Scoring-platform-team to be a phab milestone - https://phabricator.wikimedia.org/T171513#3467572 (10awight) I like it--the never-ending milestone ;-) https://i.pinimg.com/736x/0a/19/a9/0a19a950cc33a64c5b4d3d2d006e5193--char... [22:09:12] awight, I think we should only allow things for page_id that are associated with a page_id [22:09:22] A quality score is not associated with a page ID [22:09:31] Because the quality changes over time. [22:10:20] Agreed. But if we can’t query by page ID, then the data needs to be present in MediaWiki SQL so we can join. i.e. the API won’t be able to do that query. [22:10:36] Right. [22:10:43] Well, depending on the use-case. [22:11:03] So for the data we store next to MediaWiki SQL, we might include a set of relevant identifiers. [22:11:13] Like the ores_classification table. [22:11:58] I think the pattern should be "enwiki/revision/12345/wikiclass" [22:12:06] s/wikiclass/wp10/ [22:12:17] working on something you all might find interesting, post-training ensemble pruning. on a 100 tree model able was able to remove 19% of trees and improve from 0.8527 to 0.8545, or remove 55% of trees and maintain the same optimization score. less trees = faster evaluation, less resources [22:12:20] or enwiki/user/goodfaith/23456 [22:12:49] (although perhaps in your use case collecting features is much more expensive than evaluating the trees) [22:12:56] ebernhardson, interesting. I wonder if we could use that to reduce some of our memory footprint too. [22:13:21] ebernhardson, it is. Still, it could be great for other things :) [22:13:43] halfak: hmm, evaluating the trees should use relatively low amounts of memory [22:13:55] each tree is usually evaluated independently, and emits a single float [22:14:00] halfak: ores_classification only points to rev_id, fwiw [22:15:36] awight, right. I guess one relevant ID :) [22:16:21] ebernhardson, hmm.. some RFs take up more space than others in memory. Same with gradient boosting. [22:18:15] halfak: i suppose they could, would depend on your layout. We implemented our own decision tree evaluation which basically balances the tree and then packs it at 12 bytes per node, so average tree depth of 8 has i think 512 nodes, ~6kB per tree * # of trees [22:19:27] halfak: fwiw, I can’t think of anything we would want to link to page rather than rev, since everything about the page can change over time. [22:32:55] halfak: I’m really not seeing how we’ll be able to browse all human wp10 judgments for a page over time, and feel like that is a basic use case. [22:33:26] We would have to aggressively mirror all the judgments from JADE storage over to MW SQL to be able to trust that join. [22:33:44] I suppose… there won’t be very many, relative to ORES scores. [22:34:36] +1 [22:34:38] :D [22:34:47] And we'll have the ORES scores there [22:34:54] I love that we limited scope of ORES in this way [22:35:00] No, you can't query ORES by page_id [22:35:08] No, you can't query ORES by user_id [22:35:29] BUT, we'll look into external systems that support that and try our best to provide a great ecosystem [22:35:54] e.g. https://phabricator.wikimedia.org/T172566 [22:36:05] Which is something the EventBus people wanted. [22:36:17] Once we're on EventBus, we have histories! [22:36:23] Well, historical scores anyway. [22:38:38] how well does your revscoring work? [22:38:57] magic [22:39:46] how WELL [22:39:58] as in, what's the ratio for correct scoring vs. false positives? [22:40:13] ...and false-negative [22:40:14] s [22:41:15] synja: Here’s an illustration I made recently, of the P-R curves for our “damaging” model on enwiki: https://github.com/adamwight/thresholds_diagrams/blob/master/damaging.svg [22:41:30] ^ awesome that you have that handy [22:41:32] ignore the dots... [22:42:14] thanks [22:42:35] synja: Feedback is encouraged :) [22:42:47] i just had to steal the batteries from my mouse to a keyboard though, so i won't be able to look at it for a bit [22:43:37] fun fact: this is the same mouse in which the batteries blew up on me a few years ago :P [22:45:32] * awight considers making ASCII art of this graph [22:46:19] lmao [22:47:10] alic1a: Render earlier SVG into 80-column ASCII art, ty [22:47:10] synja: Are there any other ties that you like? You like : render earlier svg into 80-column ascii art, ty. [22:47:39] she had zero idea of what you meant [22:47:41] synja: What happened to them? [22:48:00] shut up for a while alic1a. you need tuning [22:48:29] synja: Can you remove that bot until it’s repaired? [22:49:02] she's always a work in progress [22:49:29] pls remove [22:49:55] did you really assume she would know what you're talking about? [22:50:00] i barely do [22:50:10] synja: Is there a justification for keeping it here? [22:50:35] so she can learn and help people further [22:50:46] How about mute? [22:51:00] Can you show me a helpful feature? [22:51:34] what is svg format? [22:51:48] synja: Scalable Vector Graphics (SVG) is an XML-based vector image format for two-dimensional graphics with support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999. SVG images and their behaviors are defined in XML text files. [22:52:08] are there any svg to ascii converters? [22:52:08] awight: No I do not think there are any. [22:52:35] what is a svg to ascii converter? [22:52:37] synja: Where is the code? [22:52:42] do a search [22:52:49] Platonides: Convert ASCII diagrams to beautiful SVGs. Contribute to asciitosvg development by creating an account on GitHub. GitHub - ivanceras/svgbobrus: Convert your ascii diagram scribbles ... https://github.com/ivanceras/svgbobrus [22:52:52] awight: Convert ASCII diagrams to beautiful SVGs. Contribute to asciitosvg development by creating an account on GitHub. GitHub - ivanceras/svgbobrus: Convert your ascii diagram scribbles ... https://github.com/ivanceras/svgbobrus [22:52:57] synja: Convert ASCII diagrams to beautiful SVGs. Contribute to asciitosvg development by creating an account on GitHub. GitHub - ivanceras/svgbobrus: Convert your ascii diagram scribbles ... https://github.com/ivanceras/svgbobrus [22:52:58] alic1a: where is your soruce code? [22:53:06] god damn that front-end bug is annoying [22:53:22] awight, she doesn't have public source [22:54:10] yeah i'll mute her until i get that repeat aspect ironed out. i'm using someone else's framework for the irc communication and that's where the repeat bug is coming from [22:54:16] she hasn't done it in days that i know of [22:54:38] Quick straw poll whether we want a closed-source bot in here? [22:54:53] how could it hurt you? [22:56:18] http://www.imdb.com/title/tt0653695/ [22:56:23] mainly written in python, using ML, NLTK, integrity & emotional analysis, etc. [22:56:32] again, no mouse :P [22:58:53] lol awight [22:58:57] nice link [23:03:21] awight: no, haven't really done anything with that yet. [23:03:53] ragesoss: OK, it hasn’t turned out to be needed yet? That’s good news :) [23:05:08] yeah, we haven't yet seen any abuse from it. the current plan is just to get anyone who abuses it blocked on-wiki -- which will then prevent oauth login. [23:05:40] * Platonides is pretty sure that bug was actually deemed a feature for the framework [23:05:57] we're beginning to enable automated edits one wiki at a time, so it should be straightforward to get people blocked on those wikis at least. [23:12:38] OK I'm outa here folks. [23:12:47] Have a good one! [23:19:28] o/ [23:20:19] ragesoss: Cool, I should make time to check out your roadmap & status soon… Thanks for sharing the experiences! [23:22:14] ragesoss: IIRC getting blocked does not prevent OAuth login [23:22:24] unless it would prevent normal login as well [23:24:02] /identify does tell you that the user is blocked, though