[13:59:36] Amir1: are you around? [14:55:50] o/ [14:56:15] Zppix, I think Am*r1 is in #wikidata land today. [14:56:32] He got what i needed [14:56:52] It was GCI [15:11:45] Cool :) [15:35:58] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831316 (10akosiaris) Pardon me, but I have to ask why a file with timestamps in the log file dating `Dec 11th`, and with a local... [15:36:22] halfak: fyi I’m trying to make this a JADE writing day [15:36:45] scap is driving me crazy [15:36:54] what on earth is going on on that bug ... [15:37:18] btw we are still ok in both eqiad/codfw despire the lowering of available uwsgi workers. No overloads yet [15:37:31] lowering in eqiad, increase in codfw [15:37:42] awight, sounds good. I'm working with Amir1 right now on some quarterly documentation. I'll re-raise this tomorrow at staff. [15:38:14] akosiaris, is that bug still blocking us from running our (maybe final) stress test on ores*? [15:38:22] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831321 (10awight) >>! In T181661#3831316, @akosiaris wrote: > Pardon me, but I have to ask why a file with timestamps in the log... [15:38:39] halfak: that's my impression. Not my call though [15:39:02] Oh. Who makes a call about us deploying to ORES*? [15:39:24] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831323 (10akosiaris) >>! In T181661#3831321, @awight wrote: >>>! In T181661#3831316, @akosiaris wrote: >> Pardon me, but I have t... [15:39:30] halfak: em, you ? [15:39:36] akosiaris: halfak: We’re okay working around the scap bugs. [15:39:50] i.e. we can stress test any time. [15:39:53] although tbh if have the damn deploys are going to fail because of this ... [15:39:57] :) I thought so. Was just wondering how akosiaris was running into it. [15:40:17] Oh, I am just trying to debug the damn thing [15:40:20] halfak: I believe akosiaris is just helping to debug the scap ssh timeout, which is a blocker to our sanity but not to this stress test. [15:40:27] *KILL THE PIGGY* [15:40:29] awight, I think I might do a quick stress test today. [15:40:34] +1 ty! [15:40:41] I’m looking forward to it. [15:40:50] OK time to get some notes together and then see if I can get this started before tech mgmt [15:41:16] halfak: if you do have to deploy a new revision, which I don’t expect, the workaround is to deploy one machine at a time, e.g. -l “ores1002.eqiad.wmnet” [15:41:36] wait... that works ? [15:41:49] yes [15:41:55] i no rite [15:42:16] hmm [15:42:36] awight: ok which branch can I deploy to ores1004 ? [15:42:43] STABLE ? CELERY_4 ? master ? [15:42:46] akosiaris: Go ahead and deploy master [15:42:49] ah nice [15:42:50] thanks [15:43:07] if you need to toggle between revisions to test stuff, STABLE is a good choice too. [15:44:03] Is ores1004 borked right now? [15:44:08] halfak: fyi, master has significant editquality and ores submodule bumps in the penultimate commit, and STABLE was just a cherry-pick around that. Feel free to deploy master to production today if you’re feeling bold. [15:44:32] halfak yeah, for all intents and purposes, assume that [15:44:41] halfak: this is non-production fwiw [15:44:58] akosiaris, should I expect ores1004 to be unusable for the tests or will it be usable if I wait for a bit? [15:45:05] halfak: akosiaris seems to be smoke-testing the ssh timeout [15:45:11] roger that [15:45:20] This is probably not a good thing to do at the same time as the timeout tests... [15:45:45] halfak don't expect it to work [15:45:53] but I can wait [15:46:10] No worries. Please continue. I can drop it from stress testing or stress test tomorrow. [15:46:13] but tbh... I don't think it has the same version of software as the others [15:46:23] it's stuck in some Jun 26 limbo [15:46:36] so it probably won't help you much to test against it [15:46:50] how on earth did that happen .... [15:47:04] halfak: we’d have to turn off the celery worker on that machine [15:47:16] awight, Oh yeah. Thank you for noting that. [15:47:20] :) [15:47:23] Can we safely do that akosiaris? [15:47:24] I’ve made all the mistakes. [15:47:36] :) [15:47:42] Sign of wisdom and experience [15:48:03] yeah that's easy [15:48:26] done [15:50:06] akosiaris: One more point of information, I get timeouts on *all* the ores100* servers when using the parallel scap deploy, not just ores1004. [15:51:19] that's reassuring... at least it's not the box [15:51:29] thanks for the info [15:51:55] Good luck, sir [15:54:12] FYI: https://en.wikipedia.org/wiki/User_talk:Risker/Risker%27s_checklist_for_content-creation_extensions#Following_the_checklist_in_JADE [15:54:29] awight, ^ [15:54:54] Great! [15:56:06] awight, see also https://etherpad.wikimedia.org/p/SPFY18Q3 [15:56:09] Amir1, ^ [15:56:19] My current proposal for next Quarter's goals. [15:56:23] Looks like we’re going ContentHandler all the way, eh? [15:57:03] In which case, the MW integration would have to come before even the MVP deployment [15:57:39] awight, content handler? [15:57:46] I don't see why we'd be interacting with that. [15:58:03] That’s the main way to do a MW integration, AIUI [15:58:16] oh. Well, I think that we don't have an option. [15:58:20] With MW Integration [15:58:27] I never thought we had an option. [15:59:04] And MVP is defined how it is defined. I think we can deploy an MVP without expecting to release it for broader use. [15:59:11] Here are examples, https://www.mediawiki.org/wiki/Content_handlers [15:59:20] So that people can develop against it and engage in the design process. [15:59:38] halfak: Hmm. An MVP that can be used to push spam and libel? [15:59:52] awight: as an FYI this fail for me [15:59:52] akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661 [15:59:52] T181661: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661 [15:59:55] Uh... Sure. Why so dramatic? [16:00:07] halfak: Hehe on that note, ’m a bit concerned that we have code written already—are you okay with throwing that out if we need to? [16:00:18] so at least this is somewhat more reproducible ? [16:00:18] awight, why would we need to do that? [16:00:51] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831411 (10thcipriani) >>! In T181661#3831316, @akosiaris wrote: > Now for the more interesting stuff. I 've tried to run `/usr/bi... [16:00:55] halfak: Have you chatted with jkatz about Collections, or heard that story? [16:01:09] akosiaris: I guess that’s good news! [16:01:14] awight, I'm very familiar with the "collections" story. [16:01:19] We're nothing like "collections" [16:02:12] The relevant part is that they were forced to close shop because they were missing curation [16:02:40] awight, right. that's why curation is a key part of the plan and that we're following Risker's checklist. [16:02:53] Why are you pushing on this when you know I'm deeply involved in thinking about it? [16:03:08] um [16:03:27] OK we can do the MVP two ways: with or without curation, right? [16:03:44] What I’m questioning is, can we actually do even an MVP without curation. My understanding is, no. [16:04:14] Well, sure. If we wait for curation to be fully implemented before 3rd party developers can play with what we're building, we'll push the whole timeline back substantially. [16:04:30] We can have 3rd party devs experiment with the system! [16:04:39] It takes a lot to figure out how to develop against a thing. [16:15:33] o/ [16:16:04] o/ codezee [16:17:47] halfak: the model turns out to be a beast for our resources, i let it run for almost 20hrs but still tune and cv_train both were stuck [16:18:28] so i killed them and ran a basic sklearn classifier, to time it, turns out sklearn one finished soon but somehow the printing of time didn't execute! as if it exited silently before that [16:19:08] codezee, do you think it has something to do with our Binarizer? [16:19:20] Or maybe you set different hyperparameters? [16:19:26] Those can have a big effect on training time. [16:20:39] halfak: but cv_train was just three folds, although i didn't think hyperparms could affect that, i chose roughly in between n_estimators:800, max_depth:7 features:log2 [16:21:02] apparently i haven't killed the cv_train run, its still on [16:21:04] For both the direct sklearn and cv_train? [16:21:14] Same hyper parameters. [16:21:22] and stuck on "Scoring cross-validation for 1" [16:21:45] halfak: no the basic script with sklearn is fishy, its not even executing the block i was expecting to see [16:21:48] Oh... that's generating statistics about cross-validation. [16:22:22] halfak: do you think small swap size~500mb could be an issue? [16:22:38] "swap size"? [16:22:41] What do you mean? [16:22:55] halfak: swap memory of lnux instance [16:22:57] *linux [16:24:31] codezee, is the instance running out of memory? [16:25:01] OOM error is not there, but swap size is full, as htop shows [16:25:11] Is memory full? [16:25:19] no [16:25:25] halfway [16:27:27] halfak: btw i was using this to test vanilla sklearn, do you think having 1.3GB of text data in memory could be an issue? - https://gist.github.com/codez266/e7d5c9ac6d7b9896b386615deb3c67db [16:28:32] codezee, then no I don't think swap is an issue. [16:28:46] I don't think that having 1.3GB of text data in memory is an issue, no. [16:31:24] its strange that the script did print "Preprocessing done, classifying..." but stopped right there [16:33:29] halfak: i take it that if cv_train is generating statistics, scoring is done, right? [16:33:59] bc these are the timestamps - https://dpaste.de/goYp [16:41:07] * halfak is in meeting, FYI [16:42:57] oh, sorry [16:51:29] halfak: This is helpful, to see what ContentHandlers exist already: https://www.mediawiki.org/wiki/Content_handlers [16:53:45] Brief non-meeting time. Reading scrollback [16:53:57] codezee, no worries. Just wanted you to know why I wasn't responding. [16:55:42] codezee, it looks to me like it is hanging on generating the predictions themselves. [16:55:59] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831602 (10akosiaris) `akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661` fails reprodu... [16:56:09] How long does it take to make a prediction based on a set of extracted features in your RAW sklearn work? [16:56:51] awight, it looks like content handlers are aimed at revisions. Is that right? [16:57:02] I think we only need to write rows to the logging table. [16:57:15] I don’t think so, they aren’t linked to article revisions. [16:57:15] Maybe there are content handlers for writing rows to the logging table. [16:57:26] *page revisions [16:57:27] halfak: that is a mystery in itself, i wrote that script to find that out, turns out that script exited abruptly [16:57:39] without reaching the time statement [16:57:59] codezee, I think re-trying that will be useful. [16:58:04] yes, doing rn [16:58:22] its solving the 93k observations currently since 15min [16:58:26] awight, see "content format" [16:58:52] We could store the judgement content in MW or we could have MW request it from JADE as needed. [16:58:57] halfak: I’m not sure what your question means, but ContentHandler items are probably stored in the revision table, yes. They don’t map to article revisions, of course. [16:59:04] ah okay you meant the former. [16:59:27] i re-ran cv_train with more generous parameters, first fold scoring done in 14 min, now scoring fold 2, waiting to see if it again gets stuck [17:00:02] awight, I don't know if we should be storing JADE stuff in the revision table. But it might make sense to store JADE judgement content in MW somewhere. [17:00:24] I don't think judgements correspond to revisions nicely. [17:00:50] They might actually—editing a judgment creates a new “head” that has a history. [17:00:51] Unless we set the most recent "preferred" judgement as the most recent revision of a "wiki entity" page. [17:01:05] There's no such thing as "editing a judgement" [17:01:06] Most recent is distinct from preferred rank. [17:01:21] halfak: There is changing your judgment, which deprecates the old one. [17:01:33] That’s analogous to editing an article. [17:01:35] awight, changing your endorsement to a different judgement [17:01:45] But changing your endorsement doesn't change which judgement is preferred. [17:01:51] +1 [17:02:17] Changing which judgement is preferred could be interpreted as a new revision. [17:02:27] If I make a judgment “damaging”, then I can come back and judge the same wiki entity as “non-damaging”. There is no “preferred” in this case. [17:02:44] But it wouldn't make sense to store the content somewhere and it seems to hardly make sense to refer to a wiki-entity as a page. [17:02:45] Preferred is only set by consensus, IMO [17:02:52] awight, there is a preferred. [17:03:13] See my notes re making the first judgement "preferred" and the ability to change that by followup. [17:03:16] Following wikidata, a newly created judgment has “normal” rank [17:03:30] awight, I don't think following wikidata makes sense here. [17:03:49] if the same editor judges the same wiki entity differently, old_judgment.rank=deprecated and the new one has “normal” rank [17:04:07] halfak: I think wikidata rank is great, let’s bookmark that for more chatting [17:04:24] There can be multiple “normal” rank judgments if multiple editors judge the same wiki entity. [17:04:45] awight, but that's bad and confusing. It would surely break the revision model. [17:04:51] umwat [17:05:02] With a page, only one revision is the "current, good" [17:05:27] using ContentHandler, ref=(editor, wiki entity) judgement={data} is its own “article” [17:05:43] A new editor can create their own judgment-article with its own history [17:06:14] When we query all judgments on a wiki entity, JADE gives us the MW ids of a bunch of “pages” which are all the judgments of that entity. [17:06:16] awight, that wouldn't make sense. many editors can endorse the same judgement. [17:06:38] The normal API is to return just rank=best [17:06:59] if there is a preferred judgment, that is returned; if there are multiple normal-rank judgments, those are returned. [17:07:15] Another API endpoint can return the entire history of each judgment. [17:07:38] * halfak --> meeting [17:07:56] yep no worries [17:09:10] It probably makes the most sense for endorsements to be their own ContentHandler type, rather than denormalizing into the judgments they refer to. Although that wouldn’t be terrible either, and I don’t see how my suggested MW integration breaks the ability for multiple editors to edit the same page. [17:13:42] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831654 (10akosiaris) >>! In T181661#3831411, @thcipriani wrote: >>>! In T181661#3831316, @akosiaris wrote: >> Now for the more in... [17:19:16] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831667 (10akosiaris) Moving the `deploy-cache/cache` directory aside did solve the issue (partially?) and moved on until... ```... [17:20:06] * akosiaris has this feeling this scap thing is going to turn into a saga [17:20:25] either that or something very stupid and easily fixed [17:23:00] awight, coming back to this. It seems to me that your formulation of pages-are-judgements doesn't correspond to a collaborative user-model. [17:23:28] Wikipedians negotiate what's True(TM) on pages. A revert of a revision is represents a disagreement of what's True(TM) [17:23:58] The preferred boolean represents something that different editors can negotiate. [17:24:47] By making it a boolean, the most recent version can then be considered the most current good judgment. [17:24:54] halfak: editors need to be free to edit their own page or collaborate [17:25:13] awight, I don't see what you're saying. [17:25:24] There are multiple truths, at least until consensus is made [17:25:30] awight, I disagree. [17:25:35] rank isn’t a boolean, it’s tri-state [17:25:42] That's not how Wikipedia works. [17:25:46] Right. I'm advocating against rank [17:25:54] ok, this is new but I’m listening [17:26:30] I don't think this is new. [17:26:51] The JADE API is the interface for judging and endorsing, so is responsible for going around setting or unsetting preferred [17:27:04] who is responsible? [17:27:06] i.e. it doesn’t matter to JADE how we store judgments in MW [17:27:18] awight, it needs to be compatible, [17:27:24] any editor [17:27:32] sure. Right. [17:27:38] any editor runs a tool that calls JADE, and assigns preferred [17:27:51] Just like any editor in good standing can change a page or revert an other's change. [17:28:00] I’m currently imagining, editors make a judgment in a JADE UI, that is send via API and becomes a ContentHandler revision. [17:28:11] yes, anyone can change that judgment, leaving behind a revision history [17:28:32] but normally, only the original author would change that statement, it’s similar to a post in a discussion thread. [17:28:41] What's a "statement"? [17:28:45] I don’t agree that the normal workflow is to go editing a single judgment [17:28:50] What do you mean "original author"? [17:29:03] I don't think judgements should be edited. [17:29:17] “statement” in that sentence is a judgment on multiple score schemas, plus comments. [17:29:22] “original author” is the creator of the judgment [17:29:38] awight, I don't think the "original author" of a judgement matters for anything. [17:29:47] Like the original author of a judgement doesn't matter for a wiki page. [17:29:48] Judgments can be edited so that * people can change their mind, and * admins can curate [17:30:00] What would you edit in a judgement? [17:30:02] but it does matter for a discussion post [17:30:23] Certainly. Discussions are parallel to judgments. [17:30:39] Endorsements are close to discussions but they are structured. [17:30:40] just… nvm the extended discussion thing for a moment, it’s complicating this [17:30:47] certainly [17:30:55] I didn't bring it up ;) [17:30:58] lol [17:31:10] sorry IRL moment [17:34:58] FWIW, my argument against rank is KISS [17:35:10] Let's add complexity if it seems to be useful. [17:36:31] back. [17:38:13] There are multiple editors making judgments, and IMO no reason they need to be reconciled in any way. That should be voluntary. [17:38:40] If 5 people each score a wiki entity, then there are 5 judgments that are returned if you query JADE for references to that entity. [17:39:38] If one of those editors changes their mind (with or without being influenced by the other judgments), they can edit their judgment and change damging -> non-damaging [17:39:52] In the backend, this creates a deprecated judgment and a new judgment. [17:40:39] A normal query on the wiki entity will still return 5 judgments, which include the newly changed opinion. [17:40:39] A historical query will return 6 judgments, and include the deprecated one. [17:40:58] I happen to think that the old and new judgment would make a nice revision history under ContentHandler, but I’m not exclusively attached to that, there are lots of ways to represent this. [17:41:56] * halfak gets out of meeting [17:42:19] awight, "There are multiple editors making judgments, and IMO no reason they need to be reconciled in any way." I totally disagree with this premise. [17:42:59] Even in wikidata, information is reconciled. [17:43:02] This is the situation with a support-oppose vote, before consensus is reached. [17:43:25] It *can* be reconciled but why or even how would you force it? [17:43:35] awight, these are in instances where discussion is necessary from the beginning. [17:43:41] In many cases, no discussion is needed. [17:43:59] 5 people may or may not agree. That’s the information we want. [17:44:10] And when discussion *is* found to be needed, then there's still a version of fact that remains while a conversation takes place. [17:44:16] awight, that will still be recorded. [17:44:18] If they choose as a group to reconcile their judgments, we have a much stronger signal. [17:44:31] This is why we need endorsements. [17:44:39] What do you mean by “these are in instances where discussion is necessary from the beginning” [17:45:05] ^ e.g. AFD [17:45:16] There's already an apparent disagreement. [17:45:29] With BRD, consensus is implied in bold action. [17:45:36] And challenged through reversion. [17:45:49] Then discussion (support/oppose straw polls) take place. [17:45:53] Bold-Revert-Discuss [17:49:34] awight, ^ [17:49:50] I like that your suggestion is honest about the fact that we’re extracting a subjective truth from multiple judgments. [17:50:16] I don’t like that the second and following editors need to overrule someone else’s judgment, if that’s what you’re saying. [17:50:42] halfak: for hashed features, shouldn't we be using a sparse matrix? [17:50:50] awight, well they can agree with it. [17:50:50] https://en.wikipedia.org/wiki/Wikipedia:BOLD,_revert,_discuss_cycle [17:51:02] I’m also trying to stay close to the onwiki process for consensus. [17:51:14] codezee, yes. I believe we are. [17:51:27] A sparse matrix is really just an array of hashmaps. [17:51:29] In which, editors post their support/opposition/meh-ness, with a justifying comment. [17:51:50] awight, that's only in a very rare set of cases. [17:51:59] In most cases, you should BOLD [17:52:08] https://en.wikipedia.org/wiki/Wikipedia:Be_bold [17:52:16] got it. But not on a Talk page. [17:52:22] Right. Of course. [17:53:07] Most judgments will not need discussion. [17:53:16] Most will just have one endorsement and will remain unchallenged. [17:53:29] They might get two endorsements that agree. [17:53:37] Very few will be contentious. [17:53:44] The tension between the “fact” component, judgment scores, and the “talk” component of comments and discussion threads, is challenging. [17:56:48] is it? Maybe I've been around wiki's and talk pages for a long time, but I've come to find it intuitive. [17:57:26] We don’t have any ContentHandler options on the Talk page for judgments, I’m assuming? [17:57:31] And no guarantee of Flow. [17:58:30] Let's not talk about ContentHandler, OK? [17:58:33] hehe [17:58:35] sure [17:58:39] It seems like that's just making everything murkey [17:58:56] We can set aside the representation entirely [17:59:40] so please describe your workflow. Two people want to score and comment on an article, what does that look like? [18:00:46] See the post I made about judgements and endorsements [18:01:14] https://www.mediawiki.org/wiki/Topic:Tzw0uv2bucrdprm4 [18:02:56] OK and if I didn’t set the “preference” bit when I made my judgment? If we judged at the same time and were unaware of another person’s judgment? If I think your judgment is valid, but want to record my disagreement? [18:03:05] Then it’s just set to whatever you had. [18:03:25] Right [18:03:36] I don’t think it’s intuitive. I think making a judgment, and debating about other people’s judgments are very different. [18:03:54] You might say, "I disagree, but I don't feel strongly enough about flipping the preference until we have discussed." [18:04:09] You're not debating other people's judgements. [18:04:09] So any UI tool to make a JADE judgment is showing you the current preferred judgment? [18:04:15] It’s such a specific workflow... [18:04:20] You're debating what the Right/True judgment should be. [18:04:31] halfak: oh i understand, i shouldn't convert hashes to full arrays, since revscoring directly passed it to sklearn and it should work [18:04:37] *passes [18:04:48] awight, it would show you the current judgement, and the set of alternatives with endorsements & comments. [18:04:56] I suppose any discussion would be linked as well. [18:05:32] It has no flexibility for wikis that want to do things differently. [18:05:36] halfak: btw, sklearn vanilla with its own hashing took 192s for hashing, 30s for classifying [18:05:42] awight, what? [18:05:43] n_estimators=10, max_depth=4 [18:06:27] halfak: If a wiki community is using this tool more as a false positive report, IMO it would be annoying to have this consensus thing pushed as the way it’s done. [18:06:37] codezee, 192s for how many records? [18:06:50] full-93k [18:07:05] awight, I have no idea what you mean. [18:07:39] What is being pushed? How would "consensus" get in the way of using this as a false-positive report? [18:07:59] In my suggested workflow, the default is to submit a “normal” rank judgment. We can both do that. If we want to get all consensus about it, we discuss and arrive at a preferred judgment. [18:08:13] codezee, that seems OK to me. [18:08:32] codezee, maybe you could try this without hashing at all -- just use the word2vec features. [18:08:34] I’m out in straw dog territory because I don’t know what editors will want, maybe they’ll overwhelmingly insist on a consensus for everything. [18:08:47] halfak: in revscoring i'm using word2vec only no hashing [18:09:03] sklearn experiment was just to make sure that multilabel in RF is OK [18:09:07] awight, please read my proposal again. The "preference" bit is set on first judgement. [18:09:18] I got that [18:09:35] halfak: see here - https://dpaste.de/5x8T what i don't seem to deduce is with such low params, its stuck at the same place [18:09:48] so it does not look like a classification issue [18:10:01] all previous steps happen in 1 min gap [18:10:02] In my proposal here, the first judgment can be set to normal or preferred, preferred would be unusual though. Whatever is the “best” rank will be returned, which might be simply normal. [18:10:50] If there are two normal-rank judgments, the UI tools will suggest a consensus process. That can be BOLD, or involve endorsements. I’d like to leave that flexibility. [18:10:52] awight, there should only be one preferred judgement. I don't think "normal" makes sense. [18:11:07] We should not have two judgements that are peers [18:11:08] That’s how it’s done in wikidata, whether or not we’re using that as a model. [18:11:12] why not? [18:11:15] Right. This isn't lkike wikidata [18:11:36] It *is* how it is done in wikidata. [18:11:59] Because there's consensus about having two "normal" level statements as the one true version of the item at some point in time. [18:12:01] I’d like to compromise that we design this to support any workflow, but you’ve said that rank is complicating things unnecessarily... [18:12:12] The "prefered" in wikidata is always the most recent revision. [18:12:34] awight, designing to support "any workflow" is a bad idea no matter what the context. [18:12:52] That’s not how I understand it. One moment please [18:14:00] I know what you mean, but disagree. If we couple to a specific workflow to a degree that it’s really hard to migrate to support others, we’ve painted ourselves into a corner. [18:14:15] This should be something that we explore during MVP... [18:14:27] All we have to do is add one column to judgments. [18:14:28] awight, we can't explore it in our MVP [18:14:33] ? [18:14:44] The MVP isn't intended for users [18:14:49] We just discussed this. [18:14:57] k no worries [18:15:03] I've already explored it in my design analysis of how editors are working on false positive reports. [18:15:05] we need to resolve this before building the whole thing [18:15:47] I want to design around what users are already doing and implement extensions later. [18:15:50] https://www.mediawiki.org/wiki/Topic:Tzw5fix7hbs4ui8j ? [18:16:01] Rather than designing for things that users do not do and seeing if they would like to do them. [18:17:32] https://www.mediawiki.org/wiki/Wikibase/DataModel#Ranks_of_Statements [18:17:33] awight, what does that topic have to do with the current discussion. [18:17:49] awight, yes I'm familiar with that. [18:18:01] > Note that there may be multiple preferred statements [18:18:01] > This may imply a multi-valued property (e.g. a person's children), or a disagreement (diverging population figures given by different sources). [18:18:13] I was the one who originally proposed ranks but then quickly decided that didn't match the work pattern. [18:18:37] awight, we have disagreements in the literature. Not in consensus. We have minority opinions in consensus. [18:18:43] I’m responding to what you said about wikidata’s ranks [18:18:57] Right. And I have been backpedaling from that suggestion for a long time. [18:18:59] > Another useful concept can be constructed based on the ranks defined above: the "best rank" for the Statements about a given Property with respect to a given Item. If there is at least one Statement with preferred rank about the property (in the context of a given Item), the best rank for that property is preferred. Otherwise, the best rank is normal. Correspondingly, the "best Statements" about a given Property in the context of a given Item are th [18:19:01] ones that have the best rank for that Property [18:19:04] I don't think wikidata's ranks make any sense. [18:19:06] sigh [18:19:26] There's no best or alternative because there's no external source of validity. [18:20:34] awight, did you read the BRD essay? [18:20:38] But there are ambiguous cases, in which forcing a preferred bit gives us incorrect data. [18:20:42] Or just look at the diagram? [18:20:47] I get the BRD cycle, and that there are alternative workflows onwiki. [18:20:52] awight, all cases must be judged. [18:21:02] There's never a state of non-consensus. [18:21:31] These alternative workflows do not suit the judgement space. It's not like we're breaking new ground here. [18:22:10] Where is your false positive design analysis? Sorry I can’t find it. [18:22:31] I’m looking at https://it.wikipedia.org/wiki/Progetto:Patrolling/ORES and it doesn’t mean much other than this “reason” thing that we don’t supprot. [18:22:36] I didn't produce a summary document. I produced a design and reflection. [18:23:04] k [18:23:40] Man. It's been a long time since I've had my design expertise so thoroughly undermined. [18:23:45] jesus [18:23:49] I agree. [18:51:11] halfak: 1 fold didn't get stuck, got over in some 5-6 min [18:55:41] The cache should have all of the features extracted already. It should only be the scoring that is so slow in cv_train [18:55:45] codezee, ^ [18:57:11] yeah [18:58:10] I'm off to lunch. [18:58:13] Back in ~ an hour [19:25:09] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Investigate why ORES logs are being written to syslog despite explicit logging config. Fix. - https://phabricator.wikimedia.org/T182614#3832012 (10awight) a:03awight [20:26:24] halfak: why does revscoring take each instance, normalize and score rather than take all labels -> normalize and score ? [20:26:30] wouldn't the latter be faster [20:26:39] *take all instances [20:26:56] Maybe slightly but it never happens in real use scenarios [20:31:57] halfak: i ran a profiling, its spending most of the time in random forest - https://dpaste.de/BPDS maybe bec of the 300dim dense word vectors [20:32:00] but not sure [22:00:40] (03PS1) 10Halfak: Adds iswiki and eswikiquote reverted models. [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/397962 [22:00:54] 10Scoring-platform-team (Current), 10ORES: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832581 (10Halfak) [22:01:34] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for eswikiquote - https://phabricator.wikimedia.org/T182218#3832592 (10Halfak) [22:01:36] 10Scoring-platform-team (Current), 10ORES: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832591 (10Halfak) [22:01:38] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Icelandic - https://phabricator.wikimedia.org/T181099#3832593 (10Halfak) [22:02:10] (03PS2) 10Halfak: Adds iswiki and eswikiquote reverted models. [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/397962 (https://phabricator.wikimedia.org/T182719) [22:31:24] 10Scoring-platform-team, 10Release-Engineering-Team, 10Scap: Scap is unhappy about deploying from a branch other than master - https://phabricator.wikimedia.org/T182498#3832681 (10greg)