[10:03:55] ello [14:25:40] that happened [19:44:58] o/ halfak [19:45:10] Hey Amir1 [19:45:38] Hey can you check this document? https://docs.google.com/document/d/1qdk4pMfmHg499z2F_9EaWp12lh2Htwoom8CEzX2kBq4 [19:45:43] add anything you want [19:45:50] change anything you want [19:50:36] fwiw, I self-merged almost all of the pending mw-ext-ORES work, due to not enuf maintainers. [19:51:21] awight: I don't have access, I +1 most of them [19:51:28] awight, probably a good call. We're in that state for code-review on other parts of the system too. [19:51:49] E.g. wikilabels and ores. I'm the only person doing code review there, so I self-merge. [19:51:50] If I can do anything please do tell [19:52:04] Amir1: thank you! You gave some helpful feedback, too. I left the final patch for you to review again if you wish [19:52:08] halfak: it's a little bit late but happy birthday [19:52:17] :) [19:52:18] Not too late. :) thanks! [19:52:19] awight: sure [19:52:28] halfak: on wikilabels? That seems sketchy--isn't it deployed on the cluster? [19:53:04] halfak: I help in code review, ping me when I'm around. or email me and I'll be around ASAP [19:54:30] awight: do you mean this? https://gerrit.wikimedia.org/r/#/c/259214/ [19:54:41] and there's one WIP patch [19:55:01] Amir1: yes, thanks. It's just logging, but might be useful for debugging. [19:55:35] sure [19:56:50] awight: IMO self-merging in these cases is totally legit and ok [19:57:56] YuviPanda: thanks! I did read the mw.org guidelines, seems like it's fine until we're on the cluster. [19:58:25] * YuviPanda nods [19:59:03] (03CR) 10Ladsgroup: [C: 031] Log API requests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/259214 (owner: 10Awight) [19:59:25] awight: I can also cabal+2 all your changes if you want [20:00:31] :p [20:00:59] I really do like CR, but when it's unavailable I become despondent. [20:01:09] (03PS4) 10Awight: Log API requests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/259214 [20:01:17] (03CR) 10Awight: [C: 032] Log API requests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/259214 (owner: 10Awight) [20:01:20] Amir1: thanks again! [20:02:10] awight: I like to do some coding too but I don't want to overlap with your work (I just want to make it faster) [20:02:17] any ideas to start? [20:03:08] Amir1: cool! I've taken my hands off of the extension for the moment, so anything you want to do in there would be great. Lemme look around for open tasks... [20:03:29] (03Merged) 10jenkins-bot: Log API requests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/259214 (owner: 10Awight) [20:05:00] thanks :) [20:05:23] halfak: Do you anything to assign me? [20:05:30] I like to do some work [20:06:04] How's the wikidata edit quality campaign going? [20:07:10] halfak: https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats [20:07:15] it staled [20:07:35] I think we can have some more contributions if we publish the blog post [20:07:41] Unstalling that would be great. Do you know if people had issues? [20:08:11] specially if we emphasize on this in "getting involved" [20:08:31] oh I'm sorry by "staled" I meant no one is labeling [20:08:44] I didn't encounter any issues [20:09:28] and I thought you wrote "uninstalling" [20:09:48] anyway. Do you think we can make that blog post published soon? [20:10:52] halfak: I vaguely remember you commenting about all the places we might want to integrate ORES data in MediaWiki. Do you know where you posted that list? [20:10:55] Amir1, yeah. Not feeling the muse today though. Will pick up the draft you sent tomorrow. [20:11:07] halfak: If you review and change anything you want. We can move on fast and publish it, it probably unstale that campaign [20:11:33] awight, https://phabricator.wikimedia.org/T120923 [20:12:11] sure [20:12:40] Amir1, I think it'll be good either way to reach out to those who have done a substantial amount of work on the campaign and ask if they need anything. [20:12:55] awight: halfak https://phabricator.wikimedia.org/T114419#1876179 on CR (food for thought) [20:13:14] sure :) [20:13:25] Maybe I send another email in wikidata-l [20:15:32] YuviPanda, reading the response to Ori's comment there makes me sad. [20:15:39] https://phabricator.wikimedia.org/T121613 [20:15:45] Best. Bug. Ever. [20:16:22] halfak: yup [20:16:30] Amir1, lol [20:16:48] halfak: it adds to my suspicion/pessimism that wikimedia as a movement is too closed off and conservative to survive long term. [20:17:54] Amir1: I've created two tasks you might be interested in, https://phabricator.wikimedia.org/T122537 and https://phabricator.wikimedia.org/T122535 [20:19:43] awight: thanks, I claimed the easy one [20:19:45] :D [20:20:51] Amir1: fantastic! Let me know if I can do anything to support your work there... [20:23:56] YuviPanda: seems like a very polarizing argument. It's also sort of shocking that operations/puppet is help up as an example to follow, cos my experience there was that my totally reasonable change was beaten over a puppet-lint barrel for weeks, by a person advocating for less CR. Strange! [20:24:02] *held up [20:25:27] awight: yes, ops/puppet has no culture of CR at all which means that if you don't have +2 nor a friend with +2 you are shit outta luck, mostly. [20:25:43] however, we have managed to do *massive* cleanups of it [20:25:52] most of what's there now is unrecognizable from 2-3y ago [20:26:16] when legoktm came as EditPage.php to the Haloween party I realized I couldn't find anything that scary in our puppet repo left, mostly because we cleaned up all of it [20:26:34] legoktm: that is so awesome [20:26:36] while I admit it isn't 'example to follow' (it is on one extreme, total cathedral), it does have its advantages [20:30:10] I'm seeing different stuff in that discussion tho, trying to reconcile with your concern that Wikimedia is an old boys' club. I think the replies to ori's comment are pretty great actually, people are building on each other's ideas, addressing just the argument and not getting personal, a discussion is scheduled... [20:30:54] * YuviPanda nods [20:31:04] I probably will too if I take off my pessimism glasses [20:31:09] which I should try to at some point soon [20:31:27] hehe. Why you don't vacation? [20:31:36] I am. in February [20:31:59] in the meantime have offered to help with https://sfconservancy.org/supporter/ [20:32:00] :D I'm working on a side project for that entire month, I'm pre-thrilled. [20:32:04] that's their donation interface... [20:32:15] took me many many seconds to find it [20:32:35] awight: nice! I'm taking many weeks off in february and my gf is in SF and we'll probabl roam around and not owrk [20:32:37] *work [20:33:11] <3 [20:34:32] If you're working on their donation stuff proper, please do ping fr-tech, we have a lot of collective l'esprit de l'escalier [20:35:03] awight: +1, so their requirement is that the flow can be completed without the donator running any non-free JS [20:35:15] awight: but their current UX sucks enough that I can probably help fix bits of it [20:37:38] * awight harrrs over backwards [20:38:29] awight: :D [20:38:34] awight: it's also running an ancient version of django [20:38:36] like 3y old [20:38:51] * awight clouds over [20:39:23] We have one of those, too. It's only being used to run a script, none of the UI stuff is active. Which makes me sad. [20:39:51] Pretty much like writing a Drupal module just to run lovely drush. [20:41:04] * awight gets sadder that YuviPanda is not in -office or -staff [20:54:43] Amir1, have you seen https://phabricator.wikimedia.org/T121005? [20:55:04] This discussion is preface for a big PR I'll have posted today or tomorrow. [20:55:18] It's a major cleanup (no EditPage.php for us) [20:55:31] It'll be good to get your thoughts about it now. [20:56:03] yeah I saw it halfak butI couldn't understand the text, I want to see the code [20:56:14] thta way I can be way more useful [20:56:18] *that [20:56:39] Start here: https://github.com/wiki-ai/revscoring/blob/features_commons/revscoring/datasources/revision_oriented.py [20:56:57] This ensures that we wind up with the right structure: https://github.com/wiki-ai/revscoring/blob/features_commons/revscoring/datasources/tests/test_revision_oriented.py [20:57:04] Gotta run. I'll be back in 30-40 minutes. [20:59:53] sure [20:59:55] thanks [21:14:50] c'mon github is blocked again [21:15:09] I'll download the code through labs [21:39:39] halAFK: tell me when you're back [21:43:53] o/ Amir1 [21:44:30] halfak: I read both codes [21:44:36] and then I read the task [21:45:01] seeing foo.bar.a.b in a code is always a good sign, it shows heavy OOP [21:45:24] I was worried about deep nesting, but it seems we're OK. [21:45:33] I've made the features match this datasource structure. [21:45:51] So let's say you want some temporal features and wikitext features, you might do: [21:46:00] from revscoring.features import temporal, wikitext [21:46:09] heavy OOP codes are easier to read, use, maintain and develop. I love them [21:46:27] features = [..., temporal.revision.parent.seconds_since, wikitext.revision.diff.words_added, ...] [21:46:45] So the different types of features has separate namespace. [21:46:57] I can see [21:47:00] wikibase and wikitext are peers in this structure [21:47:17] yeah [21:47:19] I know [21:47:30] our codes with your refactoring is way more readable [21:47:39] we can gather more developer [21:47:41] :) OK. So good that it makes more sense. [21:47:43] +1 [21:47:48] make them on track fast [21:47:57] but one thing and only one thing [21:48:07] My thought here was that we were building a monolith with one type of "datasources" and "features. [21:48:25] Now, it's much easier for someone to build up a new type of features and them merge that into it's own namespace. [21:48:47] We could even split those modules within features into separate python packages if we wanted to. [21:48:54] Not sure that's necessary, but it could work. [21:49:00] some heavy OOP codes, if they are built way deeply can be under-performer [21:49:14] http://programmers.stackexchange.com/questions/125753/does-object-orientation-really-affect-algorithm-performance [21:49:23] I'm a little bit worried about this [21:49:28] Yeah, I have been running tests, but I think the real test will be against RCStream [21:49:37] exactly [21:49:52] The fun thing about this pattern (dependency injection) is that it's really just a tree of function calls in the end. [21:49:57] I want to aask you to run some performance test on some edits and check the numbers [21:50:05] So, hopefully, this hasn't added too many nodes to the tree. [21:50:11] Amir1, +1 [21:50:33] One more thing for you to look at. [21:50:33] even if it takes longer IMO it's ok because of trad-off we have for readability, etc. but not too much [21:50:40] sure [21:50:59] So, for language-specific feature sets, I've been implementing feature_sets as a class. [21:51:09] check out /revscoring/languages/features/ [21:51:10] "The fun thing about this pattern (dependency injection) is that it's really just a tree of function calls in the end" awesome [21:51:22] can you give me a link? [21:51:27] I download it from labs [21:51:35] https://github.com/wiki-ai/revscoring/tree/features_commons/revscoring/languages/features [21:51:46] Same repo. Same branch [21:51:50] which file? [21:51:58] One sec. [21:52:39] * halfak pushes some changes he's been working on this morning. [21:53:31] ok [21:54:56] https://github.com/wiki-ai/revscoring/tree/features_commons/revscoring/languages/features/dictionary [21:54:59] This dir. [21:55:07] Start at https://github.com/wiki-ai/revscoring/blob/features_commons/revscoring/languages/features/dictionary/dictionary.py [21:57:20] Not how we construct a "features.Revision" and pass to it a "datasources.Revision" [21:58:08] We apply the same code (and features) to both Revision and the parent of Revision. [21:58:25] We know how deep to go by inspecting the shame of Datasources [21:58:27] *shape [21:58:31] lol @ shame [21:58:32] SHAME! [21:58:38] :))))) [21:59:11] okay [21:59:37] be careful about over-instantiating [21:59:55] So, the last thing I need to do for this refactor before I'm ready to submit the PR is (1) finish up the other feature sets for languages (regexes, stemmed and stopwords) and then (2) apply the new structure to all languages in the dir. [21:59:58] but in this case it's great [22:00:00] over-instantiating? [22:00:23] instantiating an object too much [22:00:39] in my case "datasources.Revision" [22:01:15] maybe I'm getting something wrong [22:01:20] it's not important [22:01:27] Maybe, but I'm not sure I understand the concern. [22:01:37] I am severely abusing namespaces. [22:01:54] "datasources" and "revision" means something different depending on *where* you are and that's important to keep track of. [22:02:39] let's say you have a = A(a,b,c) and you have f = F(g,h,i) [22:03:00] but in __init__ [22:03:19] in __init__ of the F you have some like this [22:03:34] self.data = A(g,h,i) [22:03:41] but you alread have that object [22:03:45] a [22:03:58] Aha! Gotcha. [22:04:13] so f.data and a are identical [22:04:25] So, I have some rules we'll want to be explicit. [22:04:51] 1. No constructing Datasources classes inside a Features class. [22:04:56] They must always be passed. [22:05:12] 2. A datasources will construct all of it's children and reference them. [22:05:23] This is the inspection pattern I was talking about. [22:05:39] great [22:05:44] I like that [22:06:06] 3. A Features class should always reference it's Datasources with "self.datasources". [22:06:22] So, we'll have references at the appropriate depth in the datasources tree. [22:06:38] This works nicely in practice when I want to do some extensions. [22:07:43] +1 [22:08:21] I just remembered. There is one more bit. [22:08:25] The extractor. [22:08:57] See https://github.com/wiki-ai/revscoring/blob/features_commons/revscoring/extractors/api/revision_oriented.py [22:09:13] The extractor roughly replicates the tree structure of datasources. [22:10:04] Because these set of datasources implement DependentSet, it doubles as a set(Datasource) that can be passed as "context" for dependency solving. [22:10:13] See line 29 here: https://github.com/wiki-ai/revscoring/blob/features_commons/revscoring/extractors/api/extractor.py#L29 [22:12:23] downloading [22:14:24] a note about "get_rev_doc_by_id" [22:14:27] and similar ones [22:15:42] is it a getter halfak ? [22:15:51] It doesn't look like a getter [22:15:53] Not really. [22:16:00] drop the "get_" [22:16:06] But "get_" necessarily implies "getter"? [22:16:09] I don't think so. [22:16:43] But I could see it. [22:16:53] it's misleading IMO [22:17:00] but it's no important too [22:17:03] just a name [22:17:04] Seems like we can "get_" a lot of things that aren't supposed to behave like member variables. [22:17:16] E.g. "get_" a resource. [22:17:30] Python has "__getitem__" for a method that takes a key and returns a value. [22:18:17] Do you think we should try "query_rev_doc_by_rev_id"? [22:18:18] I'm more concerned with getters in php [22:18:38] * halfak hasn't seriously implemented getters since Java [22:19:11] that would be better [22:19:23] much better [22:19:26] http://stackoverflow.com/questions/1554546/when-and-how-to-use-the-builtin-function-property-in-python [22:19:39] Looks like the consensus is "do not use getters/setters in python" [22:20:34] of course we should not implement getters because unlike php we don't have public, protected, and private variables [22:20:39] get_user_info_doc [22:20:44] Oh yeah. We have those. By convention. :) [22:20:58] "var", "_protected_var", "__private_var" [22:21:21] in python? [22:21:25] it's like java [22:21:26] Yup [22:21:40] oh boy [22:21:47] I've never seen things like this [22:21:49] Single underscore implies "protected" and you are supposed to interpret them that way. [22:22:03] http://stackoverflow.com/questions/1641219/does-python-have-private-variables-in-classes [22:23:47] pep8 under descriptive naming styles: https://www.python.org/dev/peps/pep-0008/#id29 [22:23:48] I see [22:24:02] I saw _var before [22:24:28] but it's not technically preventing you to access something [22:24:49] you can always do foo.__var and get/change that var [22:25:22] but with __ developer says you shouldn't do [22:25:23] Am I wrong? [22:25:56] but in php you can't do Foo::thing (and thing is private) [22:26:19] Yup that's right. [22:26:29] there's a nice writeup by Guido on why to make this decision. [22:26:44] so basically we don't need that [22:26:58] by that I meant getters and setters [22:27:28] " Python drops that pretence of security and encourages programmers to be responsible. In practice, this works very nicely." [22:27:42] ^ that [22:28:29] halfak: lol, I was setting up new redises for tools, and I was like 'wtf why is this not working'. turns out I had named them 'ores-redis-1002' instead of 'tools-redis-1002' [22:28:46] anyway, let's not be too distracted [22:28:56] YuviPanda, lool. ORES ON THE BRAIN [22:29:09] Amir1, +1. Hoping to have this PR submitted today before I check out. [22:29:24] Will be relying on you to push back and help me find inconsistencies. [22:29:34] We'll need to run tests too. [22:29:38] sure [22:29:49] I want to rebuild all of the editquality models before we consider doing a new release. [22:30:22] * halfak just finishes up the regexes language features. [22:30:44] great [22:31:00] I'll check that tomorrow (in 6-8 hours) [22:31:30] Sounds good. [22:31:35] I'll make sure it is ready for your review. [22:31:41] And actually passes tests :) [22:31:50] Our test coverage is going to go up substantially! [22:32:51] awesome [22:32:54] \o/ [22:33:20] when you have time, read the blog post thing [22:33:25] thanks :) [22:33:51] Will do. I'll regain broader competencies once I get this refactor out of my head. [22:34:06] Been putting off a lot of stuff to keep marching. [22:34:34] Sure :) [22:34:50] I'll help you to finish this ASAP [22:35:06] btw. I've got exams in the next two weeks [22:35:19] I need to study [22:35:42] I may not be at my full capacity but I promise I won't be less then 70-80% [22:35:55] Na. Let's make sure that the work is focused today. [22:36:02] Maybe we can plan for an hour/day [22:36:13] And I'll have things organized for that hour of study-break. [22:36:36] If I know when the hour is, I'll make sure to address all of your refactor notes before the next hour. [22:36:41] I've cut some volunteer work, social stuff, etc. [22:36:51] Same with the blog post & paper writing. [22:37:02] so I've some time for work [22:37:32] tomorrow 5 hours sooner [22:37:52] would that work for you halfak ? [22:38:13] 1800 UTC? [22:39:23] I can be from 1700 UTC until 1900 UTC [22:39:55] when is more convenient for you? [22:40:42] That'll work great for me. I plan to have a status update for you at 1700 UTC tomorrow. [22:40:53] great [22:41:59] I'll be back in 30 min.