[00:04:42] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4125561 (10awight) p:05Triage>03Normal [01:03:15] (03CR) 10Dzahn: [C: 031] "it has been deleted on prod ores machines" [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425115 (https://phabricator.wikimedia.org/T181071) (owner: 10Awight) [04:20:13] So, how hard can/should I hit ORES in requests/sec? Enwp is looking at using bots to auto-assess unassessed articles, and to check up on existing assessments. [10:54:44] (03CR) 10Alexandros Kosiaris: [C: 031] Remove transitional virtualenv copy [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425115 (https://phabricator.wikimedia.org/T181071) (owner: 10Awight) [10:54:55] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): [Blocked] Support git-lfs - https://phabricator.wikimedia.org/T180627#4126467 (10demon) error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out Herein lies the clue for this failu... [11:17:48] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: WikiLabels how-to write-up - https://phabricator.wikimedia.org/T192069#4126517 (10Miriam) p:05Triage>03High [11:42:07] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: WikiLabels how-to write-up - https://phabricator.wikimedia.org/T192069#4126587 (10bmansurov) [12:11:28] (03Abandoned) 10Alexandros Kosiaris: Remove the cluster server group and related stuff [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/409932 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [12:12:04] 10Scoring-platform-team, 10Research, 10Wikilabels, 10Research-2017-18-Q3, 10Research-2017-18-Q4: WikiLabels how-to write-up - https://phabricator.wikimedia.org/T192069#4126634 (10bmansurov) @Miriam, I've added technical bits. Let me know if you think something is missing. Thanks! [14:13:30] Woops [14:13:34] Forgot to change that :) [14:57:40] halfak: I had included one query in the footnotes in ACL paper showing the number of page creations daily, maybe we can include that here directly as statistics of what we mention? [14:57:47] since we don't have a page limit [14:58:08] The query itself? I think the quarry URL should be good. [14:58:15] Did you include draft namespace? [14:58:27] halfak: ok, you mean 118 right? [14:58:33] Right [14:58:55] halfak: yes, its your query only, I found it from somewhere :D [14:59:16] Can you share with me? I want to review it quick :) [14:59:35] halfak: do you also happen to have any that gives the backlog?, yup the page creations one is - https://quarry.wmflabs.org/query/4386 [14:59:59] Wow this is lower than I thought. [15:00:11] Oh this is just drafts [15:00:14] No mainspace. [15:01:01] The first query is the backlog size. It's the total number of pages in 118 [15:08:32] oh i see [15:15:13] Hmm. I think we'll really need to use recentchanges to get a sense for draft creation rates. [15:20:58] I keep having this uncomfortable feeling that I'm not getting something done -- then I remind myself that I'm working on a query for a research paper that we intend to publish and that this is in fact work. :) [15:21:08] I like querying DBs for stuff [15:21:11] Doesn't feel like work. [15:25:07] :D [15:26:37] halfak: are you writing the query for recent chages? [15:27:03] i was also looking at the table [15:28:23] Right. [15:28:41] I'm double-checking my estimates of how many new drafts are created every day. [15:29:48] i suppose you must be making use of the field rc_new==1, let me know when you publish, I'll not duplicate work [15:29:58] See my progress. [15:29:59] https://quarry.wmflabs.org/query/26385 [15:30:11] Looking for a good way to find all of the unpatrolled mainspace creations. [15:30:58] If you can find where "patrolled" status is stored, that'd be helpful. [15:31:10] Amir1, maybe you know since you have been deleting old autopatrolled log rows. [15:35:30] I'll have a look [15:36:43] halfak: rc_partol=0? - https://www.mediawiki.org/wiki/Manual:Recentchanges_table#rc_patrolled [15:36:57] *rc_patrolled [15:36:59] We need it for pages that are older than recentchanges [15:37:04] ok [15:41:16] afaiui it (which doesn't necessarily have to be true), this data is not saved longer than recentchanges any more. [15:42:22] eddiegp: you mean the info whether the page is patrolled or not right? [15:45:16] halfak: i'm not sure but what about using the log table where log_type="patrol" ? - https://quarry.wmflabs.org/query/26388 [15:47:22] codezee: Right. [16:07:35] halfak: the patrolled status won't be stored more than a month (rc table) [16:07:55] The page patrol backlog is older than a month! [16:08:30] there is one way to obtain it: Get logging table for manual patrol and use autopatrol rights of the creator to see if it's autopatrolled or not [16:08:59] the problem is you need to get the date of getting the autopatrol right and compare it with the timestamp of the page creation too [16:09:27] We should have this tracked in page_props or something [16:09:28] :| [16:57:55] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Add eswikibooks and svwiki to the beta cluster, enable ORES filters - https://phabricator.wikimedia.org/T188349#4127590 (10Etonkovidova) [17:12:29] Arg meetings. [17:12:43] OK. So codezee, let's ignore the size of the page patrollers backlog :) [17:12:55] And just include stats about the draft backlog. [17:15:22] I'm working on an updated query that has all of our information needs now. Should be done in a minute. [17:15:42] https://quarry.wmflabs.org/query/26385 [17:23:38] halfak: thanks! also do you think including this short result around number of people doing NPP help? - https://meta.wikimedia.org/wiki/Research:New_page_reviewer_impact_analysis/Number_of_new_page_patrollers [17:23:52] it was a small task i did some months back [17:24:37] Hi channel :) based on our recent experience, baha, jmo and myself compiled a wikiLabels how-to, here: https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/WikiLabels:_How-to. halfak et al. feel free to give feedback, it's still work in progress, but i hope it's useful! [17:37:50] whoa. surprise OS update [17:48:44] and the durn deployment window is sliding off the clock [18:10:39] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4127824 (10awight) [18:25:56] o/ [18:25:58] Sorry was AFK for lunch. Forgot to say [18:26:02] codezee, not sure I see a clear place for the NPP result in the paper. [18:26:07] DO you have a spot in mind? [18:29:25] halfak: there was a part in your writeup points where you mentioned about the backlogs around npp and extreme measures being taken so I thought [18:29:55] Right. Yeah, I cite ACTRIAL as an example of extreme measures. [18:31:10] ok,yeah, we can directly reference the actrail page [18:32:08] And the study that accompanies it. [18:58:24] codezee, can you make a prediction for [[:en:Ann Bishop (biologist)]]? [18:58:24] 10[1] 04https://meta.wikimedia.org/wiki/:en:Ann_Bishop_%28biologist%29 [18:58:47] Here's the stub: https://en.wikipedia.org/w/index.php?title=Ann_Bishop_(biologist)&oldid=518733419 [18:59:04] Could also use this early version: https://en.wikipedia.org/w/index.php?title=Ann_Bishop_(biologist)&oldid=518838758 [18:59:14] I'm working on an example. [19:04:25] on it [19:04:28] I can't get the model in drafttipic to work [19:04:31] tons of errors :( [19:04:36] kk [19:05:29] lunch. [19:28:38] halfak: its ['History_And_Society.History and society', 'Culture.Language and literature'] but I was expecting biology too [19:29:10] looking at probabilities , 'History_And_Society.History and society' : 0.51 and 'STEM.Biology':0.49 [19:29:28] Oh shit. Did you do the first revision or the second one I sent? [19:29:35] therefore its caching biology just missing by a small amount on the fraction [19:29:37] first one [19:29:41] Try the second :) [19:31:24] halfak: oldid and revid are same right? [19:31:49] 518838758 [19:31:51] Yes [19:32:53] now its just language and literature :{ [19:34:02] Hmm [19:34:08] Maybe this is a bad example :| [19:34:17] Also maybe .5 is not the right threshold. [19:34:28] Can you paste the full JSON blob for me? [19:34:37] yes i was going to add ,maybe our thresholds are messed [19:34:52] We can use our optimizations to address this later :) [19:35:44] halfak: https://pastebin.com/2J8cmrwB [19:36:22] its not insane tho medicine and bio have decent proabilities [19:47:34] I can work with this. I'll just report the probabilities above 10% and call that good. [19:52:01] codezee, I don't see Culture.Biography in the list [19:52:01] halfak: apparently when i search "when the levee breaks halfak" on google, I see some songs and the first useful link seems to be a chinese version of google scholar :D [19:52:04] Must have gotten dropped? [19:52:15] Try adding WIkipedia [19:52:45] I guess Biography is under "Language and literature" [19:53:55] OK that works. We can talk about it as part of "Wikipedians will want to fix their directory to make it reflect their understanding of subject space. [19:53:57] " [19:55:25] halfak: where did you find biography, I'm still missing it it seems :/ [19:55:56] Under Culture.Language and Literature [19:56:13] https://commons.wikimedia.org/wiki/File:ORES.drafttopic.Ann_Bishop_example.svg [19:56:38] I think this illustrates the labeling process and our prediction capacity well :) [19:56:59] It also shows a gap in the wikiproject tags! [19:57:08] STEM.Biology and STEM.Medicine [19:57:09] ! [19:57:51] halfak: oh you're referring to the directory it seems, i got worried thinking why Culture.Biography is not in 2nd level headings [19:58:04] now i see its not 2nd level in the directory itself [19:58:28] halfak: I'm analysing eswiki, it's mostly reverts with negative revert time :/ [19:58:30] yes that SVG is nice! [19:58:35] Yeah. I'm excited by the idea that Wikipedians will come and make adjustments to this directory in order to make these make more sense. [19:58:40] Amir1, wat. [19:58:41] Can you imagine how that can happen? [19:58:47] Maybe they are from article merges? [19:58:53] Examples would be where I'd start [19:59:11] it's so bad that median is negative [19:59:20] Amir1, something is wrong in your code probably [19:59:27] More than 50% are negative? [19:59:41] it happens in eswiki only [19:59:41] Or maybe in the dump itself? Seems unlikely but possible, I guess. [19:59:45] Hmm [19:59:47] wikidata and fawiki were fine [19:59:49] Examples [20:00:26] yeah, let me open it [20:01:19] halfak: if we link this svg or the quarry link, will it not affect anonymity double blind review? [20:01:27] *anonymity in [20:02:19] halfak: ok no problem, we can always mention it in a third person languahe [20:02:24] *language [20:02:34] brb meeting [20:03:59] codezee, let's not worry about that for now. I'll think more about it later. [20:04:10] Also, I'll write up something to go with that SVG if you give me 30 mins. [20:04:14] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128116 (10mmodell) [20:04:24] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): [Blocked] Support git-lfs - https://phabricator.wikimedia.org/T180627#4128118 (10mmodell) [20:04:31] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4125561 (10mmodell) 05Open>03Resolved [20:09:24] ewhit: Hi! [20:09:39] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128135 (10mmodell) Sorry it took me so long. I just saw the update. [20:25:10] halfak: This is output of the mwreverts https://phabricator.wikimedia.org/P6988 [20:25:33] an edit from 2010 is reverting an edit from 2015 [20:26:18] I still don't get why eswiki is like this and this didn't happen on fawiki [20:28:52] I live but can be online later [20:28:57] see you soon [20:31:32] ok [20:38:19] ok back [20:38:33] I wonder if the dump process for eswiki is broken. [20:38:41] * halfak digs into this example. [20:47:07] I see no reason why https://es.wikipedia.org/w/index.php?diff=37503420 would be treated as a reversion of https://es.wikipedia.org/w/index.php?diff=37503420 [20:47:24] Agreed. I think the revisions are out of order in the XML dump. [20:47:33] hmm [20:47:40] how are you comparing the revision age? [20:47:46] timestamp or revision id? [20:47:54] are revision ids compared as integers? [20:48:08] if you compared them as strings, they would indeed sort wrong [20:48:23] or not [20:48:25] not here [20:48:32] both are of the same length [20:56:32] o/ [20:56:40] cool, I’m unblocked on more git-lfs testing [21:00:27] where is that mwreverts tool? [21:01:28] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128325 (10awight) >>! In T192042#4128135, @mmodell wrote: > Sorry it took me so long. I just saw the update. <3 I'd say that resolving a ta... [21:02:53] 10Scoring-platform-team, 10ORES, 10Release-Engineering-Team (Kanban): Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128334 (10mmodell) @awight: I'd like to unblock your project as much as possible and the week is nearly over so a few hours add up. [21:35:39] halfak: Could you https://gerrit.wikimedia.org/r/#/c/425115/ at some point today? It’s important that it be included with any upcoming deployment, otherwise poor ops will have to rm -r a bunch of dirs. [22:00:42] wiki-ai/editquality#298 (huwiki_merge - b4f3988 : Adam Wight): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/365836958 [22:09:30] 10Scoring-platform-team, 10Analytics, 10Analytics-Kanban, 10ORES, 10Patch-For-Review: Enable ores::base on stat1006 - https://phabricator.wikimedia.org/T181646#4128501 (10Nuria) 05Open>03Resolved [22:11:23] 10Scoring-platform-team, 10ORES: Code generation should assert configuration basic sanity - https://phabricator.wikimedia.org/T192118#4128507 (10awight) [22:25:22] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10User-Tgr, 10artificial-intelligence: Train/test damaging and goodfaith model for Hungarian Wikipedia - https://phabricator.wikimedia.org/T185903#4128539 (10awight) Tweak to merge human and auto labels, https://github.com/wiki-... [22:34:38] (03CR) 10Halfak: [C: 032] Remove transitional virtualenv copy [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425115 (https://phabricator.wikimedia.org/T181071) (owner: 10Awight) [22:34:47] ty! [22:34:47] {{done}} awight [22:34:48] How cool, halfak! [22:34:50] :) [22:35:15] Don’t get too excited yet, but I’m prepping to make a git-lfs ORES deployment to beta… [22:35:29] There’s reason to believe this might work., [22:35:34] oooh [22:35:40] “.”, not “,”. [22:52:20] heh [22:52:29] halfak, thanks for the tip on using multiple id's. [22:52:51] :D [22:52:55] o/ SQL [22:53:04] Nice to see you here. Thanks for your work! [22:53:13] (03PS4) 10Awight: [DNM] Experimental git-lfs submodule [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/419613 (https://phabricator.wikimedia.org/T180627) [22:53:15] (03PS2) 10Awight: Point submodules at gerrit [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425717 (https://phabricator.wikimedia.org/T180627) [22:53:49] NP, I enjoy stats-y type stuff. [22:54:32] I can walk CAT:FA (5000+ members) now in < 5 mins instead of 90 mins heh [22:55:00] oh wow that’s a lifetime! [22:55:51] I wish we had something as good / fast for detecting copyvios! [22:57:54] That seems like a much harder problem, since we need to know the actual content of a huge subset of writing. [22:58:04] aye [22:58:21] But if it did exist, it would fit into ORES :-) [22:58:40] Still - integrating ORES into AFC is helping already I think. https://tools.wmflabs.org/aivanalysis/afc.php [23:01:54] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Support git-lfs - https://phabricator.wikimedia.org/T180627#4128610 (10awight) [23:02:58] SQL: That’s what turned the trend around in March? [23:03:18] awight, it's really hard to say tbh [23:03:37] halfak: What do you think I should do? [23:04:10] Amir, can you try to find one of those pages in the XML and check to see if it is out of order? [23:04:10] Amir1: If you’re looking for stuff, there’s the JADE connector for ORES… [23:04:19] oops [23:04:23] haha [23:05:01] awight: I'm literally in bed, the thing is these data is a little bit urgent [23:05:27] halfak: okay [23:05:55] Amir1, don't strain yourself. You'll be an author on the paper regardless and this isn't essential. [23:06:18] Please get sleep and only hack on this if it is interesting and you can do so healthfully. [23:06:26] it's about getting the paper in right shape, not the authorship :D [23:07:47] don't worry, I want to do something small and easy before sleep [23:07:51] bedtime deployment [23:07:58] lol [23:09:58] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Support git-lfs - https://phabricator.wikimedia.org/T180627#4128623 (10awight) With the gerrit-based submodule workaround, git-lfs is in business on the beta cluster! We have normal, working ORES install with a smal... [23:15:31] halfak: I have one idea that I can test it on my work laptop only (which is not here) I think for eswiki, it's not just out of order, I think it's completely upside down, because all numbers were negative [23:17:13] (03PS3) 10Awight: Point submodules at gerrit [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425717 (https://phabricator.wikimedia.org/T180627) [23:17:17] (03PS5) 10Awight: Add the assets submodule, git-lfs enabled [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/419613 (https://phabricator.wikimedia.org/T180627) [23:23:37] (03CR) 10Awight: [V: 032] Remove transitional virtualenv copy [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/425115 (https://phabricator.wikimedia.org/T181071) (owner: 10Awight) [23:24:18] Oh... Hmmm [23:24:28] Interesting hypothesis Amir. [23:24:36] I have some work-arounds in mind [23:24:48] But plz sleep/enjoy your evening :D [23:28:04] 10Scoring-platform-team, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Support git-lfs - https://phabricator.wikimedia.org/T180627#4128655 (10awight) Production was unsuccessful, but we're really close! ``` ssh tin git fetch https://gerrit.wikimedia.org/r/mediawiki/services/ores/deploy... [23:34:01] sure :D [23:34:07] see you soon [23:36:48] halfak: tl;dr, LFS worked on beta, but not quite on production yet. I can smell the goalpost [23:40:27] oooh That sounds great :) [23:40:32] I'll be AFK shortly FYI [23:40:36] Time to ride bikes! [23:43:07] have a good one!