[03:20:13] eisenhaus335/wikilabels#1 (master - a2258e6 : Ricky Setiawan): The build failed. https://travis-ci.org/eisenhaus335/wikilabels/builds/324439385 [07:13:35] (03CR) 10jenkins-bot: Fully deprecate Cache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401608 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [07:14:39] Amir1: https://doc.wikimedia.org/cover/extensions/ORES/ [13:45:58] o/ awight [13:46:04] heya [13:46:05] * halfak is early today [13:46:29] I got excited about responding to an email about standardized metrics for newcomer success. [13:47:53] I also have a confession to make :D [13:48:25] matt_flaschen dropped some great feedback into the https://www.mediawiki.org/w/index.php?title=Topic:Tzw4ebq17wbdog74 thread, I’m just replying now. [13:54:54] me too [14:00:05] :) [14:24:18] awight: I 'd like to start the reimaging of ores100* this week to stretch, any reason to delay it ? Keep in mind anything on those boxes will be deleted [14:24:22] halfak ^ [14:25:06] Thinking out loud: [14:25:26] nice. [14:25:29] If we image to stretch right now, we'll need to do some work on our enchant dictionaries because some are not supported on stretch. [14:25:39] If we don't, then it will be more painful to do this work later. [14:25:48] There are other package issues too, I think I commented on the task. [14:26:13] Ah yeah—python-pip stuff. [14:26:32] python-pip stuff? [14:26:40] got the link handy? [14:26:41] Lemme link. [14:26:43] :) [14:26:45] kk :) [14:27:05] https://phabricator.wikimedia.org/T182799 [14:27:56] medium length;dr, we need to rebuild wheels so that they can run on both py3.5 and py3.4 [14:28:58] awight, once we switch to stretch, we'll need to rebuild the wheels on stretch. So I don't think this will be an issue. [14:29:17] We'll want to rebuild everything on the target system config. Models too. [14:29:50] The thing to keep track of is that we make sure we can run on both platforms as the migration proceeds. [14:30:17] Otherwise, it gets all branchy and annoying to maintain. [14:30:42] how many non python only wheels are there ? [14:31:06] akosiaris: It’s worse than that: even the py-only wheels have version dependencies [14:31:39] awight, why not just have a branch for wheels for python3.5/stretch in our wheels repo? [14:31:48] that’s an option, but one I’d love to avoid. [14:31:55] Why is that? [14:31:58] I’m pretty sure we can make backward-compatible wheels [14:32:02] awight: not sure I follow. Even if they do, what's the problem ? [14:32:02] cos it’s freaking annoying. [14:32:17] akosiaris: Problem is just that, our current wheels cannot run on py3.5 [14:32:50] halfak: Will the pickled models be binary-incompatible too, or you want to rebuild those for some other reason? [14:33:28] Mostly there may/will be pecularities of the system that affect feature extraction that might/will affect predictions. [14:33:43] that pickled models is a more interesting question for me than the wheels tbh [14:33:48] So we get weird behavior if we train the model in one feature extraction environment and then use the model in the other. [14:34:13] We re-train all the models on a somewhat regular basis so this shouldn't be too much of an issue. [14:34:30] Once we get git-lfs working, it'll be a total non-issue. [14:34:54] :D [14:35:31] akosiaris: yah sorry if I misrepresented as an interesting question :p I just wanted to be clear that ORES won’t run on stretch yet. I’m looking forward to making the necessary compat happen, though. [14:35:46] I also put it on my plate to make sure our mw-vagrant roles are compatible [14:36:03] awight: yeah I got that, I am interested technically as to the why [14:36:23] OK if we want to make the stretch switch, I think the primary work of rebuilding the models is straightforward. [14:36:26] one thing for example is that the pickled objects format might have changed from python 3.4 to python 3.5 [14:36:48] Migrating the enchant dict packages is something I don't understand. [14:37:12] akosiaris, regardless we need to rebuild the models which involved re-pickling, so I'm not too concerned. [14:37:25] FWIW, pickle formats don't change much and are backwards compatible. [14:37:31] akosiaris: check this out, https://gist.github.com/adamwight/e2ba5ef370ef69420b69bc9730c7e8f5 [14:37:33] But still, I don't think it matters. [14:38:25] halfak: Any way we can run tests on the models to see if they give identical outputs? Or is that already known? [14:38:36] awight, we don't have such tests. [14:38:40] But we could. [14:39:15] I’m fine with us rebuilding everyone in a one-off sweep, even if it’s just reasonable superstition. [14:40:02] hmm so I see PyYaml, numpy, scipy, mwparserfromhell, sckit_learn, MarkupSafe,nltk, mmh3, more_itertools, textstat, mysqltsv and pyenchant tagged as py34 specific [14:40:18] yep. [14:40:31] * halfak wonders why mysqltsv got that. [14:40:44] or mwparserfromhell [14:40:44] it’s how we built them. [14:40:56] Wheels turn out to be a bit annoying, they’re not as compatible as we’d like to think. [14:41:06] They’re not even consistently forward- or backward- compatible. [14:41:44] awight, I never thought of them that way. [14:41:59] I just treat them as system-specific binaries. [14:42:07] Anyway, pretty sure that all we have to do is create a new labs box based on Stretch, rebuild the wheels, and I think we’re good. [14:42:14] +1 [14:42:32] We have a stretch instance in the analytics cluster but we don't have the ORES base dependencies (puppet) installed [14:42:36] they are not that system specific binaries.. for example most wheels are tagged py3-none-any [14:42:37] Due to enchant dicts. [14:42:52] which means they are installable on anything that has any python 3 version [14:42:54] akosiaris, right. I just treat them that way because many are not that way. [14:43:41] yeah that's true, mostly for native C extensions but some do use specific language constructs only later python versions support [14:44:40] Sure. Python 3.3 added some useful stuff. [14:45:08] e.g. New "yield from" expression for generator delegation. [14:46:34] So wrt. the models, should we plan to build a Stretch box and retrain, or should we see if the old models are safe to use on a new py? [14:46:37] yeah yield from is something I miss in python 2 every now and then. I thankfully never cared for python < 3.3 so at least I didn't have that pain [14:47:01] awight, much faster to re-train. [14:47:03] IMO [14:47:15] cool, you have my +1 fwiw [14:47:24] Awesome. [14:47:31] So about those enchant dictionaries. [14:47:44] that's one part I have 0 knowledge of [14:48:21] 10Scoring-platform-team, 10ORES: Rebuild ORES models on Stretch - https://phabricator.wikimedia.org/T184072#3871494 (10awight) [14:49:32] akosiaris, any resident deb expert we could talk to about this? [14:49:56] wait... what does this have to do with debian ? [14:50:21] when I say 0 knowledge, it's 0.. I am not even sure what those enchant packages are for [14:50:34] halfak: FYI we already maintain a few dictionary pkg ports for jessie. [14:50:43] Oh. they are language dictionaries. They are installed as debian packages via apt. [14:51:23] See https://github.com/wikimedia/puppet/blob/production/modules/ores/manifests/base.pp#L22 [14:51:49] Oooh look a this: https://github.com/wikimedia/puppet/blob/production/modules/ores/manifests/base.pp#L46 [14:51:49] those are aspell packages. The enchant one is at line 17 [14:52:02] akosiaris, they are dictionaries that enchant uses. [14:52:22] aspell, myspell, hunspell, etc. [14:52:26] Enchant is a generic spell checking library which uses existing [14:52:26] spell checker engines such as ispell, aspell and myspell as its backends. [14:52:29] ok gotcha [14:52:42] so ores is not relying on aspell, ispell etc directly but rather enchant [14:52:47] ok... so what's the issue ? [14:53:14] akosiaris: We’re just unsure whether all of the dictionary pkgs are ported to stretch. Some were missing in jessie. [14:53:14] I guess some packages do not exist on stretch ? or changed names ? [14:53:15] many of these dictionary packages are not available on stretch. We have some specifically ported to Jessie. [14:53:49] ok that's solveable [14:53:59] great. [14:54:09] depending on the number of packages it make take a while more, but definitely solveable [14:54:13] brb need food before meeting in 5 mins. [14:54:51] I was trying to find an example package, but coming up empty handed. I think we have 1-3 dictionary packages copied into our apt repo, maybe from jessie-backports. [14:54:58] akosiaris: ^ know where I would look for something like that? [14:55:27] https://apt.wikimedia.org/wikimedia/pool/main/a/aspell-id/ [14:55:34] that's one we had to backport from ubuntu ^ [14:55:47] I see nothing for myspell though [14:56:03] ty. [14:56:17] That could be the only one. [14:56:34] yeah I am checking the list on ores/manifests/base.pp right now [14:56:40] :) [14:58:17] How does that apt cache work with multiple distros, btw? [14:58:48] I guess I don’t need to know, but I’m curious why there’s no distro name in the path. [14:58:52] it just supports having multiple distro lines in /etc/apt/sources.list [14:58:57] oh you mean reprepro ? [14:58:58] oho ty [14:59:06] /o\ [14:59:15] https://wikitech.wikimedia.org/wiki/Reprepro [14:59:29] that's what we use to manage apt.wikimedia.org [14:59:46] simply put, it has a database [15:00:04] Well holler if there’s any grunt work I can do to help with copy-pasting missing dictionary metadata. [15:00:05] and just keeps pointers effectively to the package [15:00:49] it uses the /pool/ directory structure for all the packages and then under the /dists/ each distro has a Packages list that defines which are in it https://apt.wikimedia.org/wikimedia/dists/j [15:01:39] hi guys, also i found this ticket about myspell packages on jessie https://phabricator.wikimedia.org/T150003 [15:02:37] mutante: nice find, thanks! [15:03:22] https://phabricator.wikimedia.org/T150003#2777166 [15:04:06] it's only aspell-id that's missing indeed [15:04:37] i think i remember this too. yes. only aspell-id [15:05:00] yeah that's easy to fix I guess [15:05:18] I 'll backport it from xenial or yakkety [15:05:50] heh [15:05:56] https://packages.ubuntu.com/search?keywords=aspell-id&searchon=names&suite=all§ion=all [15:06:03] exact same version since trusty [15:07:57] https://anonscm.debian.org/gitweb/?p=collab-maint/aspell-id.git [15:08:06] yeah unmaintained.. that's why it's not in debian I guess [15:09:19] then again... the entire thing is unmaintained https://ftp.gnu.org/gnu/aspell/dict/id/ last release in 2004 ? [15:09:42] (sorry, we’re waylaid by a meeting) [15:24:48] aspell-id exists in stretch-wikimedia. That part is solved [15:24:50] akosiaris@ores2001:~$ apt-cache policy aspell-id [15:24:50] aspell-id: [15:24:50] Installed: (none) [15:24:50] Candidate: 1.2-0-0ubuntu1+wmf1 [15:42:30] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3871618 (10awight) [15:43:47] akosiaris: Were you saying that you’ve checked all our dictionary pkgs and they exist in stretch or at least stretch-wikimedia? [15:50:33] meeting complete [16:02:25] awight: I 've checked all debian packaged listed in https://github.com/wikimedia/puppet/blob/production/modules/ores/manifests/base.pp#L22 and they do indeed exist in stretch or stretch-wikimedia [16:02:41] akosiaris: ty that’s great news. I’ll close the task. [16:03:36] akosiaris, can we enable the ores base role on stat1005.eqiad.wmnet then? [16:03:53] 10Scoring-platform-team, 10ORES: Make sure ORES is compatible with stretch - https://phabricator.wikimedia.org/T182799#3871712 (10awight) [16:03:55] 10Scoring-platform-team (Current), 10ORES: Verify that all enchant/spelling dictionaries are available on Stretch. Port if needed. - https://phabricator.wikimedia.org/T184074#3871708 (10awight) 05Open>03Resolved a:03akosiaris [16:05:10] halfak: you mean include the ores::base class (there is no role::ores::base - nor should there be) ? If it makes sense yeah, but how does it make sense ? [16:06:13] yes. [16:06:32] akosiaris, I want to build models and wheels on stat1005 [16:06:42] Right now, I'm building them on stat1006 (Jessie) [16:08:31] RoanKattouw: Thanks for the valid_tag table, Daniel has been entertained for a while now [16:08:36] ah, via profile::hadoop::common [16:08:41] yeah that's also applied already on stat1005 [16:08:47] what you want is already done :-) [16:08:57] Amir1: Is he also going down the rabbit hole of figuring out why it existS? [16:09:10] I found that it controls which tags can be added/removed by privileged users [16:09:18] not yet, but I'm convincing him :D [16:09:24] And the code was enough of a mess that I decided to ignore that part and instead just work on the normalization stuff [16:09:49] That sounds good to me, we should kill that table IMO [16:10:10] Which is tricky enough on its own [16:10:23] Yeah, Annual goal: 2019-2020 [16:10:32] akosiaris, hmm. Let me check on that. [16:11:02] Yes, that table should probably die [16:11:05] but yeah, later [16:11:14] akosiaris, oh! It looks like I meant to ask for stat1006 [16:11:50] They are both stretch [16:12:02] So.... we've been deploying models built on stretch for a little while now O_O [16:12:19] I hadn't built wheels here yet [16:12:19] LOL [16:12:22] love it [16:12:30] moo [16:12:32] halfak: ooh try the new makefiles for that, lmk how it goes [16:13:13] halfak: maybe run my wheel probe script on the built files before committing though [16:13:30] halfak: https://gist.github.com/adamwight/e2ba5ef370ef69420b69bc9730c7e8f5 [16:27:38] awight: I have this patch in vagrant: https://gerrit.wikimedia.org/r/#/c/401595/ [16:29:23] Great work! [16:47:49] awight: And this: https://gerrit.wikimedia.org/r/#/c/401611/1 (based on your review) [16:47:51] Thank you for +2 [16:47:53] :) [16:48:17] legoktm made this: https://doc.wikimedia.org/cover/extensions/ORES/ \o/ [16:48:35] I need to add covers tag so it shows a realistic number but still so coooool [16:48:53] (03CR) 10Awight: [C: 032] Follow up to I4246706 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401611 (owner: 10Ladsgroup) [16:49:01] Yesss [16:49:40] Amir1: I’m happy to take on some test coverage if you get bored hacking on your own [16:50:11] I will let you know, Thanks for the offer <3 [16:52:11] lol they’re just tests, how badly could I screw that up? [16:53:16] halfak: fyi I’m scraping together some our integration notes into https://etherpad.wikimedia.org/p/JADE-MW_integration so that RoanKattouw has something to bite into when we chat tomorrow. [16:54:17] (03Merged) 10jenkins-bot: Follow up to I4246706 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401611 (owner: 10Ladsgroup) [16:57:21] (03CR) 10jenkins-bot: Follow up to I4246706 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401611 (owner: 10Ladsgroup) [17:11:28] awight, nice [17:11:32] Just got the calendar set up [17:11:39] You should be able to access it and add stuff to it. [17:18:08] * halfak reads https://commonists.wordpress.com/2018/01/03/there-is-no-deadline-so-every-second-is-one-on-anxiety-perfectionism-and-wikimedia-projects/ [17:20:06] halfak: awesome. [17:20:26] halfak: Remind me, we authorize wiki users on mediawiki.org? [17:20:47] awight, I think that's right. It used to be meta. [17:21:05] https://github.com/wiki-ai/wikilabels-wmflabs-deploy/blob/master/config/00-main.yaml#L4 [17:21:10] Looks like it is still meta? [17:21:14] Then we just check CentralAuth stuff to determine if they’re blocked on the wiki containing an entity. [17:21:15] kk [17:22:07] awight, na. i looked into that. We'll have to load up the apt record from centralauth and then we can look up the status on the apt wiki. [17:22:23] ?apt [17:22:24] Centralauth will allow us to match dbname to the hostname for the API to hit. [17:22:32] apt as in appropriate. [17:22:35] kk [17:22:49] apt is shorter so I like it :D [17:23:07] Ah good to hear that the list of APIs is finally centralized. I was noticing that all tools maintained their own wiki list, a few years ago. [17:23:14] suppose it conflates with aptitude. [17:23:18] halfak: Want to look over https://etherpad.wikimedia.org/p/JADE-MW_integration ? [17:23:23] I’m sure I’m missing plenty [17:24:26] (03PS1) 10Ladsgroup: Add missing covers tags [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401756 (https://phabricator.wikimedia.org/T71685) [17:24:58] awight, line 19 makes it hard to make it all fit [17:25:05] Will we allow direct editing of the Jade: namespace? Can we disallow? [17:25:12] Not sure how to think about that yet. [17:25:13] awight: https://gerrit.wikimedia.org/r/#/c/401756/1 :D [17:25:27] (03CR) 10Awight: [C: 032] Add missing covers tags [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401756 (https://phabricator.wikimedia.org/T71685) (owner: 10Ladsgroup) [17:25:38] Thank you! [17:25:59] o.c. [17:26:27] halfak: “make it all fit” into a 30-min meeting? [17:27:45] (03Merged) 10jenkins-bot: Add missing covers tags [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401756 (https://phabricator.wikimedia.org/T71685) (owner: 10Ladsgroup) [17:30:00] halfak: I just checked and it’s not possible to “manually” edit or save invalid JSON in the Schema: namespace [17:30:41] (03CR) 10jenkins-bot: Add missing covers tags [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401756 (https://phabricator.wikimedia.org/T71685) (owner: 10Ladsgroup) [17:31:46] awight, I wonder if we can use a schema there too. [17:32:30] I don’t think our schema has to fit into json or anything else, pretty sure we can just enforce validity using our custom contenthandler. [17:32:50] e.g. the topic bitfield would be atrocious to write as EventLogging JSON, IMO [17:33:38] { [17:33:38] "type": "string", [17:33:38] "enum": ["stop", "go"] [17:33:38] } [17:33:41] ^ [17:34:19] Oh crap. we don't want an enum. [17:34:20] yeah but we want multi-classification [17:34:22] We want a set [17:34:53] https://stackoverflow.com/questions/30924271/correct-way-to-define-array-of-enums-in-json-schema [17:34:54] Got it [17:35:02] Array of enum :) [17:35:27] oh I didn’t realize that “JSON schema” was a standard [17:35:40] yeah that looks fine [17:36:36] relocating for 10min... [17:36:48] 15min [17:42:34] Heading out to lunch [17:42:39] Back in ~an hour [17:55:13] halfak: namespace Jade_talk ? [17:55:41] Seems to make sense, cos the discussion is specifically about Jade: [17:55:48] That also solves the permalink issue. [17:56:04] All discussion at that page is about the corresponding Jade: page [17:58:00] If editors want to get pervy, they can even burrow down to Jade_talk:Edit/12345/damaging [17:58:03] mayne. [17:58:04] *b [18:04:04] RoanKattouw: Just to surface my cryptoping earlier, I made this for us to discuss tomorrow: https://etherpad.wikimedia.org/p/JADE-MW_integration [18:20:42] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3872358 (10fgiunchedi) >>! In T169969#3867919, @Halfak wrote: > @fgiunchedi, can you help me figure out what our next step should b... [18:32:29] Amir1: My connection is too weak for SoS, any chance you can step in? [18:32:59] Someone else in town is having a video call ;-) [19:01:31] sorry, I'm in CoC meeting [19:06:23] Amir1: no worries, audio-only was the ticket! [19:11:35] Amir1: https://gerrit.wikimedia.org/r/401775 [19:14:05] So I was going to lunch but there was a "surprise" meeting I forgot about [19:14:09] Now am actually going to lunch [19:15:33] harr [19:42:49] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872725 (10awight) Remove /vagrant/srv/ores before attempting the migration. [19:54:05] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872792 (10awight) Small glitch, probably a missing ordering: {P6520} [19:56:12] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872798 (10awight) The ordering looks good, maybe this was a race condition? Goes away with repr... [19:58:14] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872807 (10awight) Also worked fine when I dropped the wikilabels database and reprovisioned. Ig... [20:08:02] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872833 (10awight) lol, ``` Message from syslogd@mediawikivagrant at Jan 3 20:07:27 ... kernel:... [20:10:38] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Wikilabels: wikilabels vagrant role errors out on DB permissions - https://phabricator.wikimedia.org/T183605#3858433 (10awight) I ran into this, too. It works the second provisioning through, and it will also reprovision correctly even after deleting the wikil... [20:13:48] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, 10Wikilabels: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872866 (10awight) [20:16:10] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, and 2 others: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872874 (10awight) Another glitch, {P6521} [20:27:04] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3872894 (10Ladsgroup) I'm a big fan getting the whole thing more streamlined. I can probably pick this up starting two weeks from... [20:30:06] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Investigate code generation for model makefile maintenance - https://phabricator.wikimedia.org/T168455#3872899 (10awight) @Ladsgroup Cool—I'd love to be involved, see the `editquality#templating` branch. I think it's worth talking t... [20:31:03] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10MediaWiki-extensions-ORES, 10ORES, and 2 others: ORES MediaWiki-Vagrant roles should be ported to Stretch - https://phabricator.wikimedia.org/T184077#3872901 (10awight) Similar crash from the PopulateDatabase script, {P6523} [20:33:39] I'm back [20:35:08] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2018-01-09 (1.31.0-wmf.16)), 10Patch-For-Review, and 2 others: Deprecate CheckModelVersions and integrate it with the extension workflow - https://phabricator.wikimedia.org/T183468#3872921 (10awight) @Ladsgr... [20:42:34] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2018-01-09 (1.31.0-wmf.16)), 10Patch-For-Review, and 2 others: Deprecate CheckModelVersions and integrate it with the extension workflow - https://phabricator.wikimedia.org/T183468#3872935 (10Ladsgroup) Yes,... [20:43:21] Amir1 is mind-reading [20:48:37] awight: haha, I was just testing it in a new place and I was like, oh shit [20:49:53] awesome—the new (patched) logic is great for this edge case too, IMO your change makes this thing much more stable. [20:50:55] I think it was a big bottleneck [20:53:53] One that we’d neglected, even! At least, I hadn’t ever run the CheckModelVersions maintenance script. [20:54:03] a corked bottleneck [20:58:16] re-training fiwiki [20:58:41] I'm going to try the best non-crazy set of hyperparameters and see what we get [20:58:43] :D [20:58:55] brb [21:06:14] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Wikilabels: wikilabels vagrant role errors out on DB permissions - https://phabricator.wikimedia.org/T183605#3873017 (10Tgr) `Exec[initialize wikilabels database]` has to depend on `File[$cfg_file]`, I'd guess. [21:07:32] halfak: I’m drinking another flagon of Ben Stopford kool-aid, and he’s almost got me thinking that the “internal” state stores might be a reasonable way to do action validation. [21:08:21] Also, there are two APIs to transform messages, so using Postgres CDC to generate the Kafka messages could actually result in nicely-formatted events. [21:19:01] awight i like kool-aid too :) [21:19:17] i like the red one. [21:19:53] I have a harrowing tale about Kool-Aid… Do you happen to know about the Jonestown massacre? [21:20:12] nope [21:20:24] back [21:21:17] K I will can it. [21:21:24] awight, \o/ I think we agree on that point. I'm finding myself worried about our event strategy when we're going to try to do so much in MW. [21:21:31] But I still think this will be very fruitful. [21:21:43] I'll add what I think is the critical usecase to your doc. [21:21:46] I think events might actually be a big help in that regard [21:21:57] cos they allow us to decouple from MediaWiki’s vagaries [21:22:40] Only thing I'm really worried about is actions that originate from MW. [21:22:43] halfak: Which point were you agreeing on, btw? internal state stores or message transformations? [21:22:45] E.g. suppression events. Revert events. [21:22:53] Both. [21:22:58] ah, nice. [21:23:02] What’s worrisome about MW-originated events? [21:23:09] Internal state store validation of actions makes sense to me (as I think I understand it) [21:23:24] awight, I'm asking myself "Why aren't we just doing everything in MW?" [21:23:28] lolol [21:23:29] yeah [21:23:32] that happens [21:24:01] How will ORES return JADE content in a performant way? [21:26:03] Oh good question. I don't want to wait for MW there. We'll want something that looks like ORES' cache. [21:26:39] I think we'll want an event-consuming thin, fast, memory-based layer. [21:26:54] I’m loving the idea of creating a service outside of MediaWiki, so the rest of the world could potentially use this. [21:26:59] Right [21:27:02] +1 your last point. [21:27:07] OK you're making me feel better about it again :) [21:27:09] Extremely thin [21:27:26] The benefits come in when we want to do things like create interesting views into the data [21:27:27] Maybe we should look into some C even. [21:27:37] uh [21:27:40] LOLOL [21:27:48] * halfak runs out of the room laughing [21:27:52] I love C, but... [21:27:53] wat [21:28:00] well, I love C++11 [21:28:18] C can go to pasture [21:28:43] Kafka will give us opportunities to write some Java or Scala if you’re itching to perf out [21:29:57] Hum… there’s one important validity thing that I don’t see an obvious way to maintain. The database of valid wiki entities. [21:30:07] We could have incoming change-propagation… but gross. [21:30:34] There’s also a race condition between wiki entity creation and judgment creation [21:30:41] I think we'll want to hard-code that, awight. [21:31:13] sorry, I didn’t quite catch that. About the race condition? [21:31:15] Oh! I think we should allow judgements to be applied to anything -- even if it doesn't seem to exist yet. [21:31:23] I thought you meant entity-type [21:31:23] OK that’s much easier. [21:31:32] awight, solve the problem if we seem to have it. [21:31:50] naw just ID. So isn’t that allowing potentially invalid events into our idyll? [21:32:24] I think they are still valid events. Theoretically, all revision IDs, page Ids, user IDs, etc. will exist eventually. [21:32:51] We'll need to have functionality to let patrollers remove judgements that point to absurd IDs [21:33:02] Like maybe you can revert a judgement to a NULL state. [21:34:22] interesting. But rev_id=-1 would be invalid, so we might have at least that check [21:36:17] Back to your point about MW-emitted events, though [21:36:49] I’m not sure we can put a transaction around {admin suppresses something, hit JADE API indicating suppression} [21:37:00] That seems bad. [21:37:26] * awight adds to questions for catrope oracle [21:37:48] awight, I was thinking about that. [21:37:57] I think we can do something in the UI safely. [21:38:10] Hitting JADE's API to suppress something will return the event data itself. [21:38:34] So anything that hits that API will know that the action completed and will be able to update the user-presentation. [21:38:52] I think refreshing the page might be a problem. Could take time if there's lag on MW's side. [21:39:31] Does that address the problem you think? [21:39:37] or are you talking about something different? [21:40:24] Fun story: The gradient boosting models are substantially more sane but they suffer from the same problem as random forest WRT ROC-AUC [21:40:32] Interpolation is just a little broken. [21:40:50] o/ awight [21:41:11] * awight_ reads logs [21:41:32] awight, https://phabricator.wikimedia.org/P6525 [21:41:38] I think you might have missed that specifically :) [21:42:08] yep I’m in http://wm-bot.wmflabs.org/logs/%23wikimedia-ai/ :) [21:42:29] I’m not worried about a JADE UI [21:42:33] It’s the MW UI that scares me [21:42:44] that’s where an admin would do the suppressing [21:43:16] Interpolation sounds like a fun problem, actually :) [21:44:23] Is that a sufficiently chilling concern, though? Hopefully catrope has some thoughts. We could always synthesize something like Kafka’s cursor [21:44:53] Where we explicitly keep track of the last MW suppression log we read for each wiki [21:44:53] ugh [21:45:02] I gotta run, people need this chair & battry dead [22:21:14] 10Scoring-platform-team, 10ORES: Provision a Stretch box we can use to build ORES models - https://phabricator.wikimedia.org/T184073#3871504 (10Halfak) Looks like we can use stat1005.eqiad.wmnet for now. We'll want to make a new ores-misc VM in Cloud VPS for those without analytics cluster access. [22:22:35] I have been fighting with this thing for a very long time and now I realize I can't make it happen when request is GET [22:22:40] wasted hours [22:26:17] Bah. Missed awight's response. [22:26:27] BUT I finally finished off this fiwiki nonsense. [22:31:42] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3873249 (10Halfak) 1 day aggregation for 5 years is practically indefinite to me. That's OK. I'm a fan of removing anything tha... [22:34:16] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10User-Ladsgroup: Add models when initializing the table - https://phabricator.wikimedia.org/T184127#3873267 (10Ladsgroup) [22:37:21] OKAY! So it's time to start rebuilding wheels on stretch :) [22:37:31] (03PS1) 10Ladsgroup: Add models when ores_model is empty [extensions/ORES] - 10https://gerrit.wikimedia.org/r/401815 (https://phabricator.wikimedia.org/T184127) [22:38:18] halfak: have you seen this? https://github.com/wiki-ai/editquality/pull/111#issuecomment-354505721 [22:39:19] Amir1, not quite enough. Sorry to not respond sooner. [22:39:30] I want the model_info to report the right version of revscoring [22:39:42] I'd actually be OK with you manually editing that and re-pickling if you want. [22:40:01] I just want it all to match :/ [22:40:21] I see [22:40:23] okay [22:41:08] I go lie down, will be back soon [22:42:53] kk [22:43:01] * halfak gets to work on wheels. [22:46:24] * halfak sings "The wheels on the bus go round and round.." [23:20:21] https://www.preprints.org/manuscript/201801.0017/v1 for anyone who missed it [23:23:07] Oooh fun. I wonder if they'll give us their dataset :) [23:25:32] 10Scoring-platform-team, 10ORES: Rebuild ORES wheels on Stretch - https://phabricator.wikimedia.org/T184135#3873517 (10Halfak) [23:27:37] 10Scoring-platform-team, 10ORES: Rebuild ORES wheels on Stretch - https://phabricator.wikimedia.org/T184135#3873534 (10Halfak) Looks like I'll need to update revscoring for some work that @Ladsgroup is doing for editquality. See https://github.com/wiki-ai/editquality/pull/111 [23:32:30] wiki-ai/ores#911 (revscoring_2.1 - b4d769c : halfak): The build has errored. https://travis-ci.org/wiki-ai/ores/builds/324824221 [23:55:28] halfak: I'm rebuilding models again, quick question, should I delete the datasets/arwiki.autolabeled_revisions.w_cache.20k_2016.json data and reextract them or just delete the models and rebuild [23:55:35] obviously the latter is way faster [23:55:56] (keep in mind, I rextracted them using the master of revscoring) [23:56:21] I think you should just rebuild them after confirming that you have revscoring 2.1.0 installed. [23:59:43] I installed 2.1.0