[10:22:43] 10Revision-Scoring-As-A-Service-Backlog, 10wikilabels, 07JavaScript: When Chrome is preventing the pop-out window , the code is stuck ! - https://phabricator.wikimedia.org/T131667#2269481 (10Ghassanmas) the same bug appear in while edit types [13:09:16] 10Revision-Scoring-As-A-Service-Backlog, 07Documentation, 07I18n: Translating the documentation page of Wikilabels to Arabic - https://phabricator.wikimedia.org/T134405#2269794 (10Aklapper) What page is this about? Could you please provide a link so the task description is clearer to anybody? Thanks! [13:18:09] 10Revision-Scoring-As-A-Service-Backlog, 07Documentation, 07I18n: Translating the documentation page of Wikilabels to Arabic - https://phabricator.wikimedia.org/T134405#2269811 (10Ghassanmas) [13:23:39] afk [16:28:54] o/ halfak [16:29:03] Hey! [16:29:09] Just cleaning up emails. [16:29:13] Anything for me to look at? [16:29:19] no [16:29:25] I just need stuff to do [16:29:56] Gotcha. Let me look. [16:30:26] https://phabricator.wikimedia.org/T131997 [16:30:30] How about this? [16:30:43] Since you have been digging in the wikilabels DB, it should be a shallow bug for you. [16:32:18] hmm [16:32:21] let me check [16:39:13] per what I know I told you that happens because of the way we designed assignment system, We assign tasks that are not labeled [16:39:27] not tasks that are not assigned [16:39:33] so that's the reason [16:40:08] it has a huge upside which is we don't need anyone to finish their worksets so we can finish a campaign [16:40:36] but my guess is that is the downside [16:41:12] I thought we can reduce these numbers by choosing randomly, and I saw we do it already [17:23:13] Amir1, yeah, still, I think that we should assign tasks that are unassigned and then worry about reclaiming stale tasks later. [17:23:45] We could have a cron job for periodically cleaning up stale tasks. [17:29:46] hmm [17:29:56] let me think of a way to do that [17:30:26] halfak: we can start assigning tasks without label when there is not unassigned task left [17:30:35] what do you think of this? [17:31:16] Amir1, maybe only re-assign tasks that are not yet done? [17:31:34] Sorry I meant to say: only assign tasks that have become stale [17:31:52] It would be great if we could remove them from the stale workset too. [17:32:30] but defining stale is a little bit complicated [17:32:39] you need to define it per wiki [17:32:49] and it doesn't end up well [17:32:57] I finished a workset after two months [17:35:12] Why do you define it per wiki? [17:35:44] https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/database/schema.sql#L61 [17:40:32] Amir1, my sense is that there's no good reason to hang onto a workset for any substantial period of time. [17:40:45] if you want the to come back two months later to finish, request a new workset. [17:41:20] hmm [17:41:38] Let me think about it [18:10:41] halfak: so, just to be clear you want a cron or daemon to reap through worksets and abandon expired ones [18:10:57] Yeah. I think so. [18:11:08] It could happen when someone requests a workset too [18:11:24] But I imagine it might be a bit slow. [18:11:37] it can trigger a job [18:46:48] 06Revision-Scoring-As-A-Service: ScoreRevisions only loads scores for the first few revisions. - https://phabricator.wikimedia.org/T134601#2270729 (10Halfak) [18:46:54] Amir1, working on https://phabricator.wikimedia.org/T134601 BTW [18:48:28] halfak: awesome [18:48:38] and I'm working on the wikilabel issue [19:29:26] Changing location [19:29:28] back online in 40 mins [20:30:07] it seems I don't have access to github right now [20:30:07] idiots blocked it again [20:38:08] and we are back [20:38:43] super alow [20:38:44] *slow [20:42:55] o/ [20:43:03] Sorry. Been back for a while but forgot to reconnect [20:43:44] Still working out the issue with ScoredRevisions [20:44:25] halfak: hey, here's my plan [20:45:18] 1- we let everyone (not just owner) abandon worksets that expired [20:45:50] 2- then we run a cron job every day to search through expired worksets and abandon them [20:46:11] Amir1, why not have that cron job directly access the DB? [20:46:42] Are you thinking that we'll abandon worksets via the API? [20:46:43] that would be super complicated halfak [20:46:52] and we do have proper methods in place already [20:47:02] We have other scripts that connect directly to the database and perform operations. [20:47:06] You have been working on them. [20:47:31] Essentially, all that needs to happen is the execution of a single query that drops rows from workset_task. [20:51:20] hmm [20:51:23] okay [20:51:31] maybe I can get that select directly [20:51:35] let me try [20:55:43] http://stackoverflow.com/questions/21662726/delete-using-left-outer-join-in-postgres [20:55:46] I think you'll need this [20:56:13] thanks :) [21:13:15] halfak: basically, all worksets are expired since we set expiration time just a day, [21:13:37] but [21:13:39] yes. This has never been enforced before, so it might be surprising [21:14:18] it's sound=s really bad if we want to remove all worksets from people's wikilabels workspace [21:14:28] *sounds [21:14:40] I kinda like when I see these worksets completed [21:15:44] Oh! We shouldn't ever remove a task that has been labeled. [21:16:20] makes sense [21:17:02] This was what I was thinking: join workset to workset_task and workset_task to label. Anywhere were a label can't be found (outer join) and the NOW() > workset.expires, DELETE [21:17:34] We might also want to delete empty worksets too [21:17:42] We could also just leave them empty :) [21:18:25] okay [21:18:32] that sounds good [21:20:39] halfak: I also thought about this PR about removing finding good connection method, I think we can run a test in staging or experiment to see if we get bad connection after a while or not [21:20:58] Amir1, strangely, we never saw the problem on the staging machine. [21:21:05] It sometimes took weeks to manifest :/ [21:21:16] I think we should just merge the PR and see if it happens again. [21:21:20] Not that high of risk. [21:22:03] alex told me that lots of open and idle connections were in logs [21:22:14] he told me that this might be the reason [21:22:33] not logs, in his checks [21:23:08] Amir1, the connection pool should have many connections. I think we can configure it. [21:24:37] yeah but not idle connections for a while [21:24:49] connection that is not active for weeks [21:26:11] * halfak thought that's how connection pools worked [21:26:18] Definitely not sure about that [21:45:34] halfak: do you keep back up of the test database [21:45:50] Staging? [21:45:54] yeah [21:45:57] Nope [21:46:04] I just might deleted all rows in task_workset [21:46:27] https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/database/schema-testdata.sql [21:46:30] :) [21:47:11] You could grab one of the backups from the prod DB and load that too. Might be nice [21:48:11] hmm [21:48:43] can you send me one of them? [21:52:57] getting one now [21:53:06] halfak: Also we can't delete these stuff, since it needs primary key on workset_task [21:53:22] and workset_task doesn't have primary key [21:53:30] http://stackoverflow.com/questions/21662726/delete-using-left-outer-join-in-postgres [21:53:47] Damn it. [21:53:52] There's a unique pair though [21:54:04] let me try that [21:54:10] PRIMARY KEY(workset_id, task_id), [21:54:14] It does! [21:55:49] https://www.irccloud.com/pastebin/XEXhl5oZ/ [21:55:56] but anyway [21:55:58] it worked [21:56:45] https://gist.github.com/Ladsgroup/55f0b5fa1736aa40200c77c4656573ac [21:56:54] halfak: this is the statement that deletes [21:57:04] it works properly [21:58:46] I need to polish it and put it somewhere in wikilabels [21:58:48] Formatted version while I reviewed it: https://gist.github.com/halfak/3d8f80800948031178b54b98df1db1cd [21:58:50] Looks good [21:59:10] probably in utilities [21:59:18] +1 [21:59:40] nice [22:00:24] Precaching is down [22:00:51] let me read logs [22:02:58] oooh. I have a fun ORES task for you if you get bored after this. [22:03:07] I'll make a phab card and whoever gets there first wins :D [22:04:29] :D [22:04:53] It seems because of DNS changes the daemon errored [22:05:02] and puppet couldn't back it up [22:05:14] so by restart it should be fixed by now [22:05:16] but [22:05:29] I need to find a better solution [22:06:01] halfak: I'm waiting :D [22:06:48] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10revscoring: Score multiple models with the same cached dependencies - https://phabricator.wikimedia.org/T134606#2271181 (10Halfak) [22:06:50] https://phabricator.wikimedia.org/T134606 [22:09:38] OOOH [22:09:44] I do that asap [22:09:52] let me finish this wikilabels stuff [22:18:26] Amir1, I'll get a quick demo together for how to do this with a cache [22:21:57] awesome [22:28:01] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 10revscoring: Score multiple models with the same cached dependencies - https://phabricator.wikimedia.org/T134606#2271274 (10Halfak) ``` $ python Python 3.4.3 (default, Jul 28 2015, 18:20:59) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "licens... [22:28:13] Amir1, demo of re-using a cache: https://phabricator.wikimedia.org/T134606#2271274 [22:28:35] Note how the debug lines produced when executing a process() function don't appear for the second call with the provided cache. [22:29:07] yeah [22:29:23] I remember I used cache when I was extracting features for wikidata [22:29:28] and writing tests [22:34:46] I just commented on your PR. Otherwise it looks good to me [22:36:39] halfak: I'm trying to understand the demo. It seems when we pass "cache" as variable, it changes the variable [22:36:43] is it correct [22:36:45] yes [22:36:51] It is modified in-place. [22:36:53] that's crazy [22:37:02] how that's possible [22:37:07] So this will work with extractor.extract(, cache=cache) too [22:37:42] anyway [22:38:18] halfak: one other thing, how did you setup a dashboard in graphite. I tried manuals, nothing useful [22:38:31] if you can direct me to proper manuals [22:38:33] Oh god. I just started clicking on things until I figured out some patterns. [22:38:35] it would be great [22:38:48] oh [22:38:51] I had many false starts working on dashboards there because it made no sense. [22:39:01] But I could walk you through it. [22:39:22] oh no [22:39:30] It would take much of your time [22:39:42] let's use it on something useful [22:41:20] halfak: Another thing: PEP257 explicitly says use action verbs [22:41:24] it returns error [22:41:29] when you say "Removes incomplete tasks from expired worksets." [22:41:36] it should be "Remove incomplete tasks from expired worksets." [22:41:54] Not good english. One is descriptive. the other is a command. [22:42:12] Hmm... then again [22:42:23] I guess cat's docs say: cat - concatenate files and print on the standard output [22:42:41] and grep's docs: print lines matching a pattern [22:42:50] OK I'm cool with your wording [22:43:08] I still think it is weird, but standards and consistency are important! [22:43:22] I definitely not very good at wording, that's just PEP257 commands [22:44:59] halfak: graphite.wikimedia.org requires login [22:45:07] If I want to setup stuff [22:45:19] https://grafana-admin.wikimedia.org/dashboard/db/ores [22:45:20] is there a way to make an account for me there? [22:45:29] It uses your wikitech account [22:45:39] https://grafana-admin.wikimedia.org/ rather [22:46:32] Woops. Looks like it *doesn't* work with the extractor [22:46:38] I'm looking into that [22:46:58] It returns error to me [22:50:41] halfak: nothing worked as user pass [22:52:54] Hmm... not sure then. I'd ask in labs. [22:53:02] You might need a special right on your ldap [22:53:24] I asked [22:53:34] halfak: in #wikimedia-operations [22:53:43] it seems it requires NDA [23:01:52] halfak: https://commons.wikimedia.org/wiki/File:The_2016_Wikimedia_Hackathon_in_Jerusalem.webm [23:01:56] you are in here too [23:02:04] you are everywhere ;) [23:02:13] Yeah. Saw that. There's a shot with me derpily looking at a presentation with my mouth open [23:02:45] :))) [23:02:51] I will watch it tomorrow [23:03:15] anyway [23:03:32] halfak: are you working on the multiple cache? [23:03:44] Working on making it work with Extractor [23:03:58] If you want to look into how to get ORES to do it, I'll get this out of the way for you. [23:04:20] nice [23:04:22] sure [23:15:36] halfak: I'm going through function calls [23:15:41] It seems it uses solve [23:15:48] not extract [23:15:58] I'm probably wrong [23:16:01] let me check again [23:16:50] It'll use both. [23:18:53] the main function (scoring.context.score()) first extract feature values [23:19:08] and then uses extract [23:20:32] *then scores using model.score [23:21:47] it does "self.extractor.solve" [23:21:50] so I'm dumb [23:22:55] Amir1, https://github.com/wiki-ai/revscoring/pull/272 [23:23:29] :) extractor.solve() is inherited from context.solve() BTW. [23:23:35] Ive got to go. [23:23:43] I should be online for a bit tomorrow during the usual time. [23:23:45] o/ [23:26:23] wiki-ai/revscoring#712 (preserve_cache - 40cd02f : halfak): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/128431386