[05:22:05] PROBLEM - check disk on ORES-web02.Experimental is WARNING: DISK WARNING - free space: / 1080 MB (5% inode=92%); [06:16:05] RECOVERY - check disk on ORES-web02.Experimental is OK: DISK OK [06:20:05] PROBLEM - check disk on ORES-web02.Experimental is WARNING: DISK WARNING - free space: / 1057 MB (5% inode=92%); [06:26:05] RECOVERY - check disk on ORES-web02.Experimental is OK: DISK OK [13:38:26] 10Scoring-platform-team (Current), 10Edit-Review-Improvements-Integrated-Filters, 10editquality-modeling, 10Growth-Team (Current Sprint), and 2 others: Enable ORES filters on RC for Italian Wikipedia - https://phabricator.wikimedia.org/T211032 (10SBisson) a:03SBisson [14:07:50] o/ halfak I just saw on the scrum email for 2/20 adam is leaving :( thats sad [14:08:52] Hey Zppix. yeah. It is sad. :( [14:09:08] halfak: is he switching teams? [14:10:25] Nope. He's moving on to a new gig as far as I know. I'm sure we haven't seen the last of him though. He's a Wikipedian in a lot of ways :) [14:10:59] yep :) [14:35:51] * halfak digs through strategy documents [17:00:58] 10ORES, 10Scoring-platform-team, 10Analytics, 10Analytics-Cluster, 10artificial-intelligence: Package dictionaries better for ORES models - https://phabricator.wikimedia.org/T217343 (10Halfak) [17:04:30] o/ Amir1 [17:04:44] I didn't realize you were back today until that last meeting. [17:05:46] I've been working on the modular makefile stuff in my free time between meetings and I have the conversion nearly complete. It was actually pretty easy. I made a jinja2 template to do the conversion and then just took a pass over the configs to clean up some minor issues. [17:06:11] Right now, I'm rebuilding all of the models to make sure there are no substantial changes. [17:06:45] 10ORES, 10Scoring-platform-team, 10Analytics, 10Analytics-Cluster, 10artificial-intelligence: Package dictionaries better for ORES models - https://phabricator.wikimedia.org/T217343 (10EBernhardson) To be clear, we are talking about https://github.com/AbiWord/enchant ? [17:06:45] We don't need to merge the rebuilt models. I just want to check the model_info. [17:07:05] halfak: yeah, I'm not fully around. Just meetings and light work [17:07:27] nice, [17:07:36] let me know when you're done and I take a look [17:07:49] OK gotcha. [17:07:55] Now I need to be afk to eat something [17:08:07] I am working in assumption that you didn't have any other notes than the global.yaml one. [17:08:34] I want to get rid of out manual makefile in a followup commit :D [17:15:54] 10ORES, 10Scoring-platform-team, 10Analytics, 10Analytics-Cluster, 10artificial-intelligence: Package dictionaries better for ORES models - https://phabricator.wikimedia.org/T217343 (10Halfak) Yes. And the pyenchant python library. [18:56:31] halfak: What I care mostly is that Makefile itself stays untouched. The last time I checked your PR that file had 2000 lines changed [18:56:57] Amir1, help me understand why that matters if we are achieving the same outcome? [18:57:29] There was a lot of cruft added to the makefile last time and this is an opportunity to remove it. [18:58:10] I think it should stay as a pure refactor and then we can decide on what needs to be done (=don't do too many things in one PR) [18:59:02] This is just one component of the larger system. [18:59:26] The config and template should be the focus of review. [18:59:35] The Makefile is secondary to that. [19:02:43] I disagree, I want this to be a refactor. If you want to get it merged as something else though I wouldn't push too hard [19:04:07] I think it is a refactor. It achieves the same outcomes as the old code. [19:04:18] No other code needs to be changed in order to interact with it. [19:04:34] the output interface is unchanged. The input interface is changed. No other dependencies remain. [19:05:12] Historically, we tried to match the structure of the old makefile because it was easier to review changes. I think that was a mistake and we should have instead ensured that we built the same models instead. [19:05:18] Because we had problems with that afterward. [19:08:48] Looking at our model building process... we spend more time waiting on IO than waiting on CPU. [19:09:00] So I'm not sure how much of an impact hadoop will have in the short term. [19:09:05] I guess tuning is a different story. [19:21:59] * halfak waits for fiwiki to rebuild. [19:22:01] *sigh* [19:22:09] 250k observations is a lot. [19:27:10] Amir1, ^ [19:27:30] noted [21:51:23] 10ORES, 10Scoring-platform-team, 10Analytics, 10Analytics-Cluster, 10artificial-intelligence: Package dictionaries better for ORES models - https://phabricator.wikimedia.org/T217343 (10EBernhardson) So, the difficulty here is going to be that this isn't just dictionaries that feed into python deps, there... [21:54:12] for reference of 250k observations being alot, we train 35M observation models in hadoop (with 50 features, although i've run 250 feature models before). Indeed if we are looking at relatively small datasets you might not get much benefit from hadoop. I ported your hyperparameter tuning of some model to spark once to test and it could run the entire thing (all parameters of all models) in [21:54:18] parallel meaning it finished in ~2 minutes, but that's not necessarily important for you [22:23:06] 10ORES, 10Scoring-platform-team, 10Analytics, 10Analytics-Cluster, 10artificial-intelligence: Package dictionaries better for ORES models - https://phabricator.wikimedia.org/T217343 (10Halfak) My sense is that we can have a pretty good guarantee that the dictionaries are the dictionaries, but you're righ... [23:48:43] Meeting marathon done. I'm heading out for the day. have a good one, folks. :)