[13:59:36] <Zppix>	 Amir1: are you around?
[14:55:50] <halfak>	 o/
[14:56:15] <halfak>	 Zppix, I think Am*r1 is in #wikidata land today. 
[14:56:32] <Zppix>	 He got what i needed
[14:56:52] <Zppix>	 It was GCI
[15:11:45] <halfak>	 Cool :) 
[15:35:58] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831316 (10akosiaris) Pardon me, but I have to ask why a file with timestamps in the log file dating `Dec 11th`, and with a local...
[15:36:22] <awight>	 halfak: fyi I’m trying to make this a JADE writing day
[15:36:45] <akosiaris>	 scap is driving me crazy
[15:36:54] <akosiaris>	 what on earth is going on on that bug ...
[15:37:18] <akosiaris>	 btw we are still ok in both eqiad/codfw despire the lowering of available uwsgi workers. No overloads yet
[15:37:31] <akosiaris>	 lowering in eqiad, increase in codfw
[15:37:42] <halfak>	 awight, sounds good.  I'm working with Amir1 right now on some quarterly documentation.  I'll re-raise this tomorrow at staff.
[15:38:14] <halfak>	 akosiaris, is that bug still blocking us from running our (maybe final) stress test on ores*?
[15:38:22] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831321 (10awight) >>! In T181661#3831316, @akosiaris wrote: > Pardon me, but I have to ask why a file with timestamps in the log...
[15:38:39] <akosiaris>	 halfak: that's my impression. Not my call though
[15:39:02] <halfak>	 Oh.  Who makes a call about us deploying to ORES*?
[15:39:24] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831323 (10akosiaris) >>! In T181661#3831321, @awight wrote: >>>! In T181661#3831316, @akosiaris wrote: >> Pardon me, but I have t...
[15:39:30] <akosiaris>	 halfak: em, you ?
[15:39:36] <awight>	 akosiaris: halfak: We’re okay working around the scap bugs.
[15:39:50] <awight>	 i.e. we can stress test any time.
[15:39:53] <akosiaris>	 although tbh if have the damn deploys are going to fail because of this ...
[15:39:57] <halfak>	 :)  I thought so.  Was just wondering how akosiaris was running into it. 
[15:40:17] <akosiaris>	 Oh, I am just trying to debug the damn thing
[15:40:20] <awight>	 halfak: I believe akosiaris is just helping to debug the scap ssh timeout, which is a blocker to our sanity but not to this stress test.
[15:40:27] <awight>	 *KILL THE PIGGY*
[15:40:29] <halfak>	 awight, I think I might do a quick stress test today. 
[15:40:34] <awight>	 +1 ty!
[15:40:41] <awight>	 I’m looking forward to it.
[15:40:50] <halfak>	 OK time to get some notes together and then see if I can get this started before tech mgmt
[15:41:16] <awight>	 halfak: if you do have to deploy a new revision, which I don’t expect, the workaround is to deploy one machine at a time, e.g. -l “ores1002.eqiad.wmnet”
[15:41:36] <akosiaris>	 wait... that works ?
[15:41:49] <awight>	 yes
[15:41:55] <awight>	 i no rite
[15:42:16] <akosiaris>	 hmm
[15:42:36] <akosiaris>	 awight: ok which branch can I deploy to ores1004 ?
[15:42:43] <akosiaris>	 STABLE ? CELERY_4 ? master ?
[15:42:46] <awight>	 akosiaris: Go ahead and deploy master
[15:42:49] <akosiaris>	 ah nice
[15:42:50] <akosiaris>	 thanks
[15:43:07] <awight>	 if you need to toggle between revisions to test stuff, STABLE is a good choice too.
[15:44:03] <halfak>	 Is ores1004 borked right now?
[15:44:08] <awight>	 halfak: fyi, master has significant editquality and ores submodule bumps in the penultimate commit, and STABLE was just a cherry-pick around that.  Feel free to deploy master to production today if you’re feeling bold.
[15:44:32] <akosiaris>	 halfak yeah, for all intents and purposes, assume that
[15:44:41] <awight>	 halfak: this is non-production fwiw
[15:44:58] <halfak>	 akosiaris, should I expect ores1004 to be unusable for the tests or will it be usable if I wait for a bit?
[15:45:05] <awight>	 halfak: akosiaris seems to be smoke-testing the ssh timeout
[15:45:11] <halfak>	 roger that
[15:45:20] <awight>	 This is probably not a good thing to do at the same time as the timeout tests...
[15:45:45] <akosiaris>	 halfak don't expect it to work
[15:45:53] <akosiaris>	 but I can wait
[15:46:10] <halfak>	 No worries.  Please continue.  I can drop it from stress testing or stress test tomorrow. 
[15:46:13] <akosiaris>	 but tbh... I don't think it has the same version of software as the others
[15:46:23] <akosiaris>	 it's stuck in some Jun 26 limbo
[15:46:36] <akosiaris>	 so it probably won't help you much to test against it
[15:46:50] <akosiaris>	 how on earth did that happen ....
[15:47:04] <awight>	 halfak: we’d have to turn off the celery worker on that machine
[15:47:16] <halfak>	 awight, Oh yeah.  Thank you for noting that. 
[15:47:20] <awight>	 :)
[15:47:23] <halfak>	 Can we safely do that akosiaris?
[15:47:24] <awight>	 I’ve made all the mistakes.
[15:47:36] <halfak>	 :) 
[15:47:42] <halfak>	 Sign of wisdom and experience
[15:48:03] <akosiaris>	 yeah that's easy
[15:48:26] <akosiaris>	 done
[15:50:06] <awight>	 akosiaris: One more point of information, I get timeouts on *all* the ores100* servers when using the parallel scap deploy, not just ores1004.
[15:51:19] <akosiaris>	 that's reassuring... at least it's not the box
[15:51:29] <akosiaris>	 thanks for the info
[15:51:55] <awight>	 Good luck, sir
[15:54:12] <halfak>	 FYI: https://en.wikipedia.org/wiki/User_talk:Risker/Risker%27s_checklist_for_content-creation_extensions#Following_the_checklist_in_JADE
[15:54:29] <halfak>	 awight, ^ 
[15:54:54] <awight>	 Great!
[15:56:06] <halfak>	 awight, see also https://etherpad.wikimedia.org/p/SPFY18Q3
[15:56:09] <halfak>	 Amir1, ^ 
[15:56:19] <halfak>	 My current proposal for next Quarter's goals. 
[15:56:23] <awight>	 Looks like we’re going ContentHandler all the way, eh?
[15:57:03] <awight>	 In which case, the MW integration would have to come before even the MVP deployment
[15:57:39] <halfak>	 awight, content handler?
[15:57:46] <halfak>	 I don't see why we'd be interacting with that. 
[15:58:03] <awight>	 That’s the main way to do a MW integration, AIUI
[15:58:16] <halfak>	 oh.  Well, I think that we don't have an option. 
[15:58:20] <halfak>	 With MW Integration
[15:58:27] <halfak>	 I never thought we had an option. 
[15:59:04] <halfak>	 And MVP is defined how it is defined.  I think we can deploy an MVP without expecting to release it for broader use. 
[15:59:11] <awight>	 Here are examples, https://www.mediawiki.org/wiki/Content_handlers
[15:59:20] <halfak>	 So that people can develop against it and engage in the design process. 
[15:59:38] <awight>	 halfak: Hmm.  An MVP that can be used to push spam and libel?
[15:59:52] <akosiaris>	 awight: as an FYI this fail for me
[15:59:52] <akosiaris>	 akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661
[15:59:52] <stashbot>	 T181661: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661
[15:59:55] <halfak>	 Uh...  Sure.  Why so dramatic? 
[16:00:07] <awight>	 halfak: Hehe on that note, ’m a bit concerned that we have code written already—are you okay with throwing that out if we need to?
[16:00:18] <akosiaris>	 so at least this is somewhat more reproducible ?
[16:00:18] <halfak>	 awight, why would we need to do that?
[16:00:51] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831411 (10thcipriani) >>! In T181661#3831316, @akosiaris wrote: > Now for the more interesting stuff. I 've tried to run `/usr/bi...
[16:00:55] <awight>	 halfak: Have you chatted with jkatz about Collections, or heard that story?
[16:01:09] <awight>	 akosiaris: I guess that’s good news!
[16:01:14] <halfak>	 awight, I'm very familiar with the "collections" story. 
[16:01:19] <halfak>	 We're nothing like "collections"
[16:02:12] <awight>	 The relevant part is that they were forced to close shop because they were missing curation
[16:02:40] <halfak>	 awight, right.  that's why curation is a key part of the plan and that we're following Risker's checklist. 
[16:02:53] <halfak>	 Why are you pushing on this when you know I'm deeply involved in thinking about it? 
[16:03:08] <awight>	 um
[16:03:27] <awight>	 OK we can do the MVP two ways: with or without curation, right?
[16:03:44] <awight>	 What I’m questioning is, can we actually do even an MVP without curation.  My understanding is, no.
[16:04:14] <halfak>	 Well, sure.  If we wait for curation to be fully implemented before 3rd party developers can play with what we're building, we'll push the whole timeline back substantially. 
[16:04:30] <halfak>	 We can have 3rd party devs experiment with the system!
[16:04:39] <halfak>	 It takes a lot to figure out how to develop against a thing. 
[16:15:33] <codezee>	 o/
[16:16:04] <halfak>	 o/ codezee 
[16:17:47] <codezee>	 halfak: the model turns out to be a beast for our resources, i let it run for almost 20hrs but still tune and cv_train both were stuck
[16:18:28] <codezee>	 so i killed them and ran a basic sklearn classifier, to time it, turns out sklearn one finished soon but somehow the printing of time didn't execute! as if it exited silently before that
[16:19:08] <halfak>	 codezee, do you think it has something to do with our Binarizer? 
[16:19:20] <halfak>	 Or maybe you set different hyperparameters?
[16:19:26] <halfak>	 Those can have a big effect on training time. 
[16:20:39] <codezee>	 halfak: but cv_train was just three folds, although i didn't think hyperparms could affect that, i chose roughly in between n_estimators:800, max_depth:7 features:log2
[16:21:02] <codezee>	 apparently i haven't killed the cv_train run, its still on
[16:21:04] <halfak>	 For both the direct sklearn and cv_train?
[16:21:14] <halfak>	 Same hyper parameters. 
[16:21:22] <codezee>	 and stuck on  "Scoring cross-validation for 1"
[16:21:45] <codezee>	 halfak: no the basic script with sklearn is fishy, its not even executing the block i was expecting to see
[16:21:48] <halfak>	 Oh... that's generating statistics about cross-validation. 
[16:22:22] <codezee>	 halfak: do you think small swap size~500mb could be an issue?
[16:22:38] <halfak>	 "swap size"?
[16:22:41] <halfak>	 What do you mean?
[16:22:55] <codezee>	 halfak: swap memory of lnux instance
[16:22:57] <codezee>	 *linux
[16:24:31] <halfak>	 codezee, is the instance running out of memory?
[16:25:01] <codezee>	 OOM error is not there, but swap size is full, as htop shows
[16:25:11] <halfak>	 Is memory full?
[16:25:19] <codezee>	 no
[16:25:25] <codezee>	 halfway
[16:27:27] <codezee>	 halfak: btw i was using this to test vanilla sklearn, do you think having 1.3GB of text data in memory could be an issue? - https://gist.github.com/codez266/e7d5c9ac6d7b9896b386615deb3c67db
[16:28:32] <halfak>	 codezee, then no I don't think swap is an issue. 
[16:28:46] <halfak>	 I don't think that having 1.3GB of text data in memory is an issue, no. 
[16:31:24] <codezee>	 its strange that the script did print "Preprocessing done, classifying..." but stopped right there
[16:33:29] <codezee>	 halfak: i take it that if cv_train is generating statistics, scoring is done, right?
[16:33:59] <codezee>	 bc these are the timestamps - https://dpaste.de/goYp
[16:41:07] * halfak is in meeting, FYI
[16:42:57] <codezee>	 oh, sorry
[16:51:29] <awight>	 halfak: This is helpful, to see what ContentHandlers exist already: https://www.mediawiki.org/wiki/Content_handlers
[16:53:45] <halfak>	 Brief non-meeting time.  Reading scrollback
[16:53:57] <halfak>	 codezee, no worries.  Just wanted you to know why I wasn't responding. 
[16:55:42] <halfak>	 codezee, it looks to me like it is hanging on generating the predictions themselves. 
[16:55:59] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831602 (10akosiaris) `akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661`  fails reprodu...
[16:56:09] <halfak>	 How long does it take to make a prediction based on a set of extracted features in your RAW sklearn work?
[16:56:51] <halfak>	 awight, it looks like content handlers are aimed at revisions.  Is that right? 
[16:57:02] <halfak>	 I think we only need to write rows to the logging table. 
[16:57:15] <awight>	 I don’t think so, they aren’t linked to article revisions.
[16:57:15] <halfak>	 Maybe there are content handlers for writing rows to the logging table. 
[16:57:26] <halfak>	 *page revisions
[16:57:27] <codezee>	 halfak: that is a mystery in itself, i wrote that script to find that out, turns out that script exited abruptly
[16:57:39] <codezee>	 without reaching the time statement
[16:57:59] <halfak>	 codezee, I think re-trying that will be useful.  
[16:58:04] <codezee>	 yes, doing rn
[16:58:22] <codezee>	 its solving the 93k observations currently since 15min
[16:58:26] <halfak>	 awight, see "content format" 
[16:58:52] <halfak>	 We could store the judgement content in MW or we could have MW request it from JADE as needed. 
[16:58:57] <awight>	 halfak: I’m not sure what your question means, but ContentHandler items are probably stored in the revision table, yes.  They don’t map to article revisions, of course.
[16:59:04] <awight>	 ah okay you meant the former.
[16:59:27] <codezee>	 i re-ran cv_train with more generous parameters, first fold scoring done in 14 min, now scoring fold 2, waiting to see if it again gets stuck
[17:00:02] <halfak>	 awight, I don't know if we should be storing JADE stuff in the revision table.  But it might make sense to store JADE judgement content in MW somewhere. 
[17:00:24] <halfak>	 I don't think judgements correspond to revisions nicely. 
[17:00:50] <awight>	 They might actually—editing a judgment creates a new “head” that has a history.
[17:00:51] <halfak>	 Unless we set the most recent "preferred" judgement as the most recent revision of a "wiki entity" page. 
[17:01:05] <halfak>	 There's no such thing as "editing a judgement"
[17:01:06] <awight>	 Most recent is distinct from preferred rank.
[17:01:21] <awight>	 halfak: There is changing your judgment, which deprecates the old one.
[17:01:33] <awight>	 That’s analogous to editing an article.
[17:01:35] <halfak>	 awight, changing your endorsement to a different judgement
[17:01:45] <halfak>	 But changing your endorsement doesn't change which judgement is preferred. 
[17:01:51] <awight>	 +1
[17:02:17] <halfak>	 Changing which judgement is preferred could be interpreted as a new revision. 
[17:02:27] <awight>	 If I make a judgment “damaging”, then I can come back and judge the same wiki entity as “non-damaging”.  There is no “preferred” in this case.
[17:02:44] <halfak>	 But it wouldn't make sense to store the content somewhere and it seems to hardly make sense to refer to a wiki-entity as a page. 
[17:02:45] <awight>	 Preferred is only set by consensus, IMO
[17:02:52] <halfak>	 awight, there is a preferred. 
[17:03:13] <halfak>	 See my notes re making the first judgement "preferred" and the ability to change that by followup. 
[17:03:16] <awight>	 Following wikidata, a newly created judgment has “normal” rank
[17:03:30] <halfak>	 awight, I don't think following wikidata makes sense here. 
[17:03:49] <awight>	 if the same editor judges the same wiki entity differently, old_judgment.rank=deprecated and the new one has “normal” rank
[17:04:07] <awight>	 halfak: I think wikidata rank is great, let’s bookmark that for more chatting
[17:04:24] <awight>	 There can be multiple “normal” rank judgments if multiple editors judge the same wiki entity.
[17:04:45] <halfak>	 awight, but that's bad and confusing.  It would surely break the revision model. 
[17:04:51] <awight>	 umwat
[17:05:02] <halfak>	 With a page, only one revision is the "current, good"
[17:05:27] <awight>	 using ContentHandler, ref=(editor, wiki entity) judgement={data} is its own “article”
[17:05:43] <awight>	 A new editor can create their own judgment-article with its own history
[17:06:14] <awight>	 When we query all judgments on a wiki entity, JADE gives us the MW ids of a bunch of “pages” which are all the judgments of that entity.
[17:06:16] <halfak>	 awight, that wouldn't make sense.  many editors can endorse the same judgement. 
[17:06:38] <awight>	 The normal API is to return just rank=best
[17:06:59] <awight>	 if there is a preferred judgment, that is returned; if there are multiple normal-rank judgments, those are returned.
[17:07:15] <awight>	 Another API endpoint can return the entire history of each judgment.
[17:07:38] * halfak --> meeting
[17:07:56] <awight>	 yep no worries
[17:09:10] <awight>	 It probably makes the most sense for endorsements to be their own ContentHandler type, rather than denormalizing into the judgments they refer to.  Although that wouldn’t be terrible either, and I don’t see how my suggested MW integration breaks the ability for multiple editors to edit the same page.
[17:13:42] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831654 (10akosiaris) >>! In T181661#3831411, @thcipriani wrote: >>>! In T181661#3831316, @akosiaris wrote: >> Now for the more in...
[17:19:16] <wikibugs>	 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3831667 (10akosiaris) Moving the `deploy-cache/cache` directory aside did solve the issue (partially?) and moved on until...  ```...
[17:20:06] * akosiaris has this feeling this scap thing is going to turn into a saga
[17:20:25] <akosiaris>	 either that or something very stupid and easily fixed
[17:23:00] <halfak>	 awight, coming back to this.  It seems to me that your formulation of pages-are-judgements doesn't correspond to a collaborative user-model.  
[17:23:28] <halfak>	 Wikipedians negotiate what's True(TM) on pages.  A revert of a revision is represents a disagreement of what's True(TM)
[17:23:58] <halfak>	 The preferred boolean represents something that different editors can negotiate. 
[17:24:47] <halfak>	 By making it a boolean, the most recent version can then be considered the most current good judgment. 
[17:24:54] <awight>	 halfak: editors need to be free to edit their own page or collaborate
[17:25:13] <halfak>	 awight, I don't see what you're saying. 
[17:25:24] <awight>	 There are multiple truths, at least until consensus is made
[17:25:30] <halfak>	 awight, I disagree. 
[17:25:35] <awight>	 rank isn’t a boolean, it’s tri-state
[17:25:42] <halfak>	 That's not how Wikipedia works.  
[17:25:46] <halfak>	 Right.  I'm advocating against rank
[17:25:54] <awight>	 ok, this is new but I’m listening
[17:26:30] <halfak>	 I don't think this is new. 
[17:26:51] <awight>	 The JADE API is the interface for judging and endorsing, so is responsible for going around setting or unsetting preferred
[17:27:04] <halfak>	 who is responsible?
[17:27:06] <awight>	 i.e. it doesn’t matter to JADE how we store judgments in MW
[17:27:18] <halfak>	 awight, it needs to be compatible, 
[17:27:24] <awight>	 any editor
[17:27:32] <halfak>	 sure.  Right. 
[17:27:38] <awight>	 any editor runs a tool that calls JADE, and assigns preferred
[17:27:51] <halfak>	 Just like any editor in good standing can change a page or revert an other's change. 
[17:28:00] <awight>	 I’m currently imagining, editors make a judgment in a JADE UI, that is send via API and becomes a ContentHandler revision.
[17:28:11] <awight>	 yes, anyone can change that judgment, leaving behind a revision history
[17:28:32] <awight>	 but normally, only the original author would change that statement, it’s similar to a post in a discussion thread.
[17:28:41] <halfak>	 What's a "statement"?
[17:28:45] <awight>	 I don’t agree that the normal workflow is to go editing a single judgment
[17:28:50] <halfak>	 What do you mean "original author"?
[17:29:03] <halfak>	 I don't think judgements should be edited. 
[17:29:17] <awight>	 “statement” in that sentence is a judgment on multiple score schemas, plus comments.
[17:29:22] <awight>	 “original author” is the creator of the judgment
[17:29:38] <halfak>	 awight, I don't think the "original author" of a judgement matters for anything. 
[17:29:47] <halfak>	 Like the original author of a judgement doesn't matter for a wiki page. 
[17:29:48] <awight>	 Judgments can be edited so that * people can change their mind, and * admins can curate
[17:30:00] <halfak>	 What would you edit in a judgement?
[17:30:02] <awight>	 but it does matter for a discussion post
[17:30:23] <halfak>	 Certainly.  Discussions are parallel to judgments. 
[17:30:39] <halfak>	 Endorsements are close to discussions but they are structured. 
[17:30:40] <awight>	 just… nvm the extended discussion thing for a moment, it’s complicating this
[17:30:47] <halfak>	 certainly
[17:30:55] <halfak>	 I didn't bring it up ;) 
[17:30:58] <awight>	 lol
[17:31:10] <awight>	 sorry IRL moment
[17:34:58] <halfak>	 FWIW, my argument against rank is KISS
[17:35:10] <halfak>	 Let's add complexity if it seems to be useful. 
[17:36:31] <awight>	 back.
[17:38:13] <awight>	 There are multiple editors making judgments, and IMO no reason they need to be reconciled in any way.  That should be voluntary.
[17:38:40] <awight>	 If 5 people each score a wiki entity, then there are 5 judgments that are returned if you query JADE for references to that entity.
[17:39:38] <awight>	 If one of those editors changes their mind (with or without being influenced by the other judgments), they can edit their judgment and change damging -> non-damaging
[17:39:52] <awight>	 In the backend, this creates a deprecated judgment and a new judgment.
[17:40:39] <awight>	 A normal query on the wiki entity will still return 5 judgments, which include the newly changed opinion.
[17:40:39] <awight>	 A historical query will return 6 judgments, and include the deprecated one.
[17:40:58] <awight>	 I happen to think that the old and new judgment would make a nice revision history under ContentHandler, but I’m not exclusively attached to that, there are lots of ways to represent this.
[17:41:56] * halfak gets out of meeting
[17:42:19] <halfak>	 awight, "There are multiple editors making judgments, and IMO no reason they need to be reconciled in any way." I totally disagree with this premise. 
[17:42:59] <halfak>	 Even in wikidata, information is reconciled. 
[17:43:02] <awight>	 This is the situation with a support-oppose vote, before consensus is reached.
[17:43:25] <awight>	 It *can* be reconciled but why or even how would you force it?
[17:43:35] <halfak>	 awight, these are in instances where discussion is necessary from the beginning. 
[17:43:41] <halfak>	 In many cases, no discussion is needed. 
[17:43:59] <awight>	 5 people may or may not agree.  That’s the information we want.
[17:44:10] <halfak>	 And when discussion *is* found to be needed, then there's still a version of fact that remains while a conversation takes place. 
[17:44:16] <halfak>	 awight, that will still be recorded. 
[17:44:18] <awight>	 If they choose as a group to reconcile their judgments, we have a much stronger signal.
[17:44:31] <halfak>	 This is why we need endorsements. 
[17:44:39] <awight>	 What do you mean by “these are in instances where discussion is necessary from the beginning”
[17:45:05] <halfak>	 ^ e.g. AFD
[17:45:16] <halfak>	 There's already an apparent disagreement. 
[17:45:29] <halfak>	 With BRD, consensus is implied in bold action. 
[17:45:36] <halfak>	 And challenged through reversion. 
[17:45:49] <halfak>	 Then discussion (support/oppose straw polls) take place. 
[17:45:53] <halfak>	 Bold-Revert-Discuss
[17:49:34] <halfak>	 awight, ^ 
[17:49:50] <awight>	 I like that your suggestion is honest about the fact that we’re extracting a subjective truth from multiple judgments.
[17:50:16] <awight>	 I don’t like that the second and following editors need to overrule someone else’s judgment, if that’s what you’re saying.
[17:50:42] <codezee>	 halfak: for hashed features, shouldn't we be using a sparse matrix?
[17:50:50] <halfak>	 awight, well they can agree with it. 
[17:50:50] <halfak>	 https://en.wikipedia.org/wiki/Wikipedia:BOLD,_revert,_discuss_cycle
[17:51:02] <awight>	 I’m also trying to stay close to the onwiki process for consensus.
[17:51:14] <halfak>	 codezee, yes.  I believe we are. 
[17:51:27] <halfak>	 A sparse matrix is really just an array of hashmaps. 
[17:51:29] <awight>	 In which, editors post their support/opposition/meh-ness, with a justifying comment.
[17:51:50] <halfak>	 awight, that's only in a very rare set of cases. 
[17:51:59] <halfak>	 In most cases, you should BOLD
[17:52:08] <halfak>	 https://en.wikipedia.org/wiki/Wikipedia:Be_bold
[17:52:16] <awight>	 got it.  But not on a Talk page.
[17:52:22] <halfak>	 Right.  Of course. 
[17:53:07] <halfak>	 Most judgments will not need discussion.  
[17:53:16] <halfak>	 Most will just have one endorsement and will remain unchallenged. 
[17:53:29] <halfak>	 They might get two endorsements that agree. 
[17:53:37] <halfak>	 Very few will be contentious. 
[17:53:44] <awight>	 The tension between the “fact” component, judgment scores, and the “talk” component of comments and discussion threads, is challenging.
[17:56:48] <halfak>	 is it?  Maybe I've been around wiki's and talk pages for a long time, but I've come to find it intuitive. 
[17:57:26] <awight>	 We don’t have any ContentHandler options on the Talk page for judgments, I’m assuming?
[17:57:31] <awight>	 And no guarantee of Flow.
[17:58:30] <halfak>	 Let's not talk about ContentHandler, OK?
[17:58:33] <awight>	 hehe
[17:58:35] <awight>	 sure
[17:58:39] <halfak>	 It seems like that's just making everything murkey
[17:58:56] <awight>	 We can set aside the representation entirely
[17:59:40] <awight>	 so please describe your workflow.  Two people want to score and comment on an article, what does that look like?
[18:00:46] <halfak>	 See the post I made about judgements and endorsements 
[18:01:14] <halfak>	 https://www.mediawiki.org/wiki/Topic:Tzw0uv2bucrdprm4
[18:02:56] <awight>	 OK and if I didn’t set the “preference” bit when I made my judgment?  If we judged at the same time and were unaware of another person’s judgment?  If I think your judgment is valid, but want to record my disagreement?
[18:03:05] <awight>	 Then it’s just set to whatever you had.
[18:03:25] <halfak>	 Right
[18:03:36] <awight>	 I don’t think it’s intuitive.  I think making a judgment, and debating about other people’s judgments are very different.
[18:03:54] <halfak>	 You might say, "I disagree, but I don't feel strongly enough about flipping the preference until we have discussed."
[18:04:09] <halfak>	 You're not debating other people's judgements. 
[18:04:09] <awight>	 So any UI tool to make a JADE judgment is showing you the current preferred judgment?
[18:04:15] <awight>	 It’s such a specific workflow...
[18:04:20] <halfak>	 You're debating what the Right/True judgment should be. 
[18:04:31] <codezee>	 halfak: oh i understand, i shouldn't convert hashes to full arrays, since revscoring directly passed it to sklearn and it should work
[18:04:37] <codezee>	 *passes
[18:04:48] <halfak>	 awight, it would show you the current judgement, and the set of alternatives with endorsements & comments. 
[18:04:56] <halfak>	 I suppose any discussion would be linked as well. 
[18:05:32] <awight>	 It has no flexibility for wikis that want to do things differently.
[18:05:36] <codezee>	 halfak: btw, sklearn vanilla with its own hashing took 192s for hashing, 30s for classifying 
[18:05:42] <halfak>	 awight, what?
[18:05:43] <codezee>	 n_estimators=10, max_depth=4
[18:06:27] <awight>	 halfak: If a wiki community is using this tool more as a false positive report, IMO it would be annoying to have this consensus thing pushed as the way it’s done.
[18:06:37] <halfak>	 codezee, 192s for how many records?
[18:06:50] <codezee>	 full-93k
[18:07:05] <halfak>	 awight, I have no idea what you mean. 
[18:07:39] <halfak>	 What is being pushed?  How would "consensus" get in the way of using this as a false-positive report?
[18:07:59] <awight>	 In my suggested workflow, the default is to submit a “normal” rank judgment.  We can both do that.  If we want to get all consensus about it, we discuss and arrive at a preferred judgment.
[18:08:13] <halfak>	 codezee, that seems OK to me. 
[18:08:32] <halfak>	 codezee, maybe you could try this without hashing at all -- just use the word2vec features. 
[18:08:34] <awight>	 I’m out in straw dog territory because I don’t know what editors will want, maybe they’ll overwhelmingly insist on a consensus for everything.
[18:08:47] <codezee>	 halfak: in revscoring i'm using word2vec only no hashing
[18:09:03] <codezee>	 sklearn experiment was just to make sure that multilabel in RF is OK
[18:09:07] <halfak>	 awight, please read my proposal again.  The "preference" bit is set on first judgement.  
[18:09:18] <awight>	 I got that
[18:09:35] <codezee>	 halfak: see here - https://dpaste.de/5x8T what i don't seem to deduce is with such low params, its stuck at the same place
[18:09:48] <codezee>	 so it does not look like a classification issue
[18:10:01] <codezee>	 all previous steps happen in 1 min gap
[18:10:02] <awight>	 In my proposal here, the first judgment can be set to normal or preferred, preferred would be unusual though.  Whatever is the “best” rank will be returned, which might be simply normal.
[18:10:50] <awight>	 If there are two normal-rank judgments, the UI tools will suggest a consensus process.  That can be BOLD, or involve endorsements.  I’d like to leave that flexibility.
[18:10:52] <halfak>	 awight, there should only be one preferred judgement.  I don't think "normal" makes sense. 
[18:11:07] <halfak>	 We should not have two judgements that are peers
[18:11:08] <awight>	 That’s how it’s done in wikidata, whether or not we’re using that as a model.
[18:11:12] <awight>	 why not?
[18:11:15] <halfak>	 Right.  This isn't lkike wikidata
[18:11:36] <halfak>	 It *is* how it is done in wikidata. 
[18:11:59] <halfak>	 Because there's consensus about having two "normal" level statements as the one true version of the item at some point in time. 
[18:12:01] <awight>	 I’d like to compromise that we design this to support any workflow, but you’ve said that rank is complicating things unnecessarily...
[18:12:12] <halfak>	 The "prefered" in wikidata is always the most recent revision. 
[18:12:34] <halfak>	 awight, designing to support "any workflow" is a bad idea no matter what the context. 
[18:12:52] <awight>	 That’s not how I understand it.  One moment please
[18:14:00] <awight>	 I know what you mean, but disagree.  If we couple to a specific workflow to a degree that it’s really hard to migrate to support others, we’ve painted ourselves into a corner.
[18:14:15] <awight>	 This should be something that we explore during MVP...
[18:14:27] <awight>	 All we have to do is add one column to judgments.
[18:14:28] <halfak>	 awight, we can't explore it in our MVP
[18:14:33] <awight>	 ?
[18:14:44] <halfak>	 The MVP isn't intended for users
[18:14:49] <halfak>	 We just discussed this. 
[18:14:57] <awight>	 k no worries
[18:15:03] <halfak>	 I've already explored it in my design analysis of how editors are working on false positive reports. 
[18:15:05] <awight>	 we need to resolve this before building the whole thing
[18:15:47] <halfak>	 I want to design around what users are already doing and implement extensions later. 
[18:15:50] <awight>	 https://www.mediawiki.org/wiki/Topic:Tzw5fix7hbs4ui8j ?
[18:16:01] <halfak>	 Rather than designing for things that users do not do and seeing if they would like to do them. 
[18:17:32] <awight>	 https://www.mediawiki.org/wiki/Wikibase/DataModel#Ranks_of_Statements
[18:17:33] <halfak>	 awight, what does that topic have to do with the current discussion. 
[18:17:49] <halfak>	 awight, yes I'm familiar with that. 
[18:18:01] <awight>	 > Note that there may be multiple preferred statements
[18:18:01] <awight>	 > This may imply a multi-valued property (e.g. a person's children), or a disagreement (diverging population figures given by different sources).
[18:18:13] <halfak>	 I was the one who originally proposed ranks but then quickly decided that didn't match the work pattern. 
[18:18:37] <halfak>	 awight, we have disagreements in the literature.  Not in consensus.  We have minority opinions in consensus. 
[18:18:43] <awight>	 I’m responding to what you said about wikidata’s ranks
[18:18:57] <halfak>	 Right.  And I have been backpedaling from that suggestion for a long time. 
[18:18:59] <awight>	 > Another useful concept can be constructed based on the ranks defined above: the "best rank" for the Statements about a given Property with respect to a given Item. If there is at least one Statement with preferred rank about the property (in the context of a given Item), the best rank for that property is preferred. Otherwise, the best rank is normal. Correspondingly, the "best Statements" about a given Property in the context of a given Item are th
[18:19:01] <awight>	 ones that have the best rank for that Property
[18:19:04] <halfak>	 I don't think wikidata's ranks make any sense. 
[18:19:06] <awight>	 sigh
[18:19:26] <halfak>	 There's no best or alternative because there's no external source of validity. 
[18:20:34] <halfak>	 awight, did you read the BRD essay?
[18:20:38] <awight>	 But there are ambiguous cases, in which forcing a preferred bit gives us incorrect data.
[18:20:42] <halfak>	 Or just look at the diagram?
[18:20:47] <awight>	 I get the BRD cycle, and that there are alternative workflows onwiki.
[18:20:52] <halfak>	 awight, all cases must be judged. 
[18:21:02] <halfak>	 There's never a state of non-consensus. 
[18:21:31] <halfak>	 These alternative workflows do not suit the judgement space.  It's not like we're breaking new ground here. 
[18:22:10] <awight>	 Where is your false positive design analysis?  Sorry I can’t find it.
[18:22:31] <awight>	 I’m looking at https://it.wikipedia.org/wiki/Progetto:Patrolling/ORES and it doesn’t mean much other than this “reason” thing that we don’t supprot.
[18:22:36] <halfak>	 I didn't produce a summary document.  I produced a design and reflection. 
[18:23:04] <awight>	 k
[18:23:40] <halfak>	 Man.  It's been a long time since I've had my design expertise so thoroughly undermined. 
[18:23:45] <awight>	 jesus
[18:23:49] <halfak>	 I agree. 
[18:51:11] <codezee>	 halfak: 1 fold didn't get stuck, got over in some 5-6 min
[18:55:41] <halfak>	 The cache should have all of the features extracted already.  It should only be the scoring that is so slow in cv_train
[18:55:45] <halfak>	 codezee, ^ 
[18:57:11] <codezee>	 yeah 
[18:58:10] <halfak|Lunch>	 I'm off to lunch. 
[18:58:13] <halfak|Lunch>	 Back in ~ an hour
[19:25:09] <wikibugs>	 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Investigate why ORES logs are being written to syslog despite explicit logging config.  Fix. - https://phabricator.wikimedia.org/T182614#3832012 (10awight) a:03awight
[20:26:24] <codezee>	 halfak: why does revscoring take each instance, normalize and score rather than take all labels -> normalize and score ?
[20:26:30] <codezee>	 wouldn't the latter be faster
[20:26:39] <codezee>	 *take all instances
[20:26:56] <halfak>	 Maybe slightly but it never happens in real use scenarios
[20:31:57] <codezee>	 halfak: i ran a profiling, its spending most of the time in random forest - https://dpaste.de/BPDS maybe bec of the 300dim dense word vectors
[20:32:00] <codezee>	 but not sure
[22:00:40] <wikibugs>	 (03PS1) 10Halfak: Adds iswiki and eswikiquote reverted models. [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/397962
[22:00:54] <wikibugs>	 10Scoring-platform-team (Current), 10ORES: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832581 (10Halfak)
[22:01:34] <wikibugs>	 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for eswikiquote - https://phabricator.wikimedia.org/T182218#3832592 (10Halfak)
[22:01:36] <wikibugs>	 10Scoring-platform-team (Current), 10ORES: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832591 (10Halfak)
[22:01:38] <wikibugs>	 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Icelandic - https://phabricator.wikimedia.org/T181099#3832593 (10Halfak)
[22:02:10] <wikibugs>	 (03PS2) 10Halfak: Adds iswiki and eswikiquote reverted models. [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/397962 (https://phabricator.wikimedia.org/T182719)
[22:31:24] <wikibugs>	 10Scoring-platform-team, 10Release-Engineering-Team, 10Scap: Scap is unhappy about deploying from a branch other than master - https://phabricator.wikimedia.org/T182498#3832681 (10greg)