[00:04:44] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10EBernhardson) How do we think these thresholds should be applied, It sounds like we need to inject them prior to the indexin... [04:31:04] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10Tgr) What exactly should be shown before the user clicks Show more? Or does that feature go away entirely? > Under each header, the relevant... [07:11:45] (03PS1) 10DannyS712: Remove use of ApiTestCase::doLogin [extensions/ORES] - 10https://gerrit.wikimedia.org/r/570526 (https://phabricator.wikimedia.org/T244039) [07:16:44] (03PS2) 10DannyS712: Remove use of ApiTestCase::doLogin [extensions/ORES] - 10https://gerrit.wikimedia.org/r/570526 (https://phabricator.wikimedia.org/T244039) [07:58:16] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MassMessage, and 13 others: Api tests: Hard deprecate $this->doLogin, remove calls in favor of passing a user where needed - https://phabricator.wikimedia.org/T244039 (10Usmanino) [08:04:16] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MassMessage, and 13 others: Api tests: Hard deprecate $this->doLogin, remove calls in favor of passing a user where needed - https://phabricator.wikimedia.org/T244039 (10DannyS712) [08:47:46] 10ORES, 10Scoring-platform-team (Current), 10Operations: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10akosiaris) I 've tried to reproduce this. It's easily reproducible after all. Just do what logrotate does and issue `systemctl reload uwsgi-ores`. CPU usage spikes and reache... [11:18:33] 10Scoring-platform-team, 10Discovery-Search: Consume ORES articletopic data from Kafka and store it in HDFS - https://phabricator.wikimedia.org/T240553 (10kostajh) [11:19:16] 10Scoring-platform-team, 10Discovery-Search, 10Epic, 10Growth-Team (Current Sprint): [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics) - https://phabricator.wikimedia.org/T240517 (10kostajh) [11:19:59] 10Scoring-platform-team, 10Discovery-Search, 10Epic, 10Growth-Team (Current Sprint): [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics) - https://phabricator.wikimedia.org/T240517 (10kostajh) [11:28:58] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10kostajh) Looks good to me. @MMiller_WMF can you please make this sheet publicly viewable unless there's reason not to yet? [11:48:15] 10Scoring-platform-team (Research), 10MachineVision, 10Structured-Data-Backlog, 10artificial-intelligence: Implement NSFW image classifier using Open NSFW - https://phabricator.wikimedia.org/T214201 (10JEumerus) [14:49:17] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10RHo) >>! In T244421#5854832, @Tgr wrote: > What exactly should be shown before the user clicks Show more? Or does that feature go away entirel... [16:10:02] 10ORES, 10Scoring-platform-team (Current), 10Operations: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10Halfak) OK rolled back. Looking at what happened, it seems like memory pressure was *way worse* in Eqiad than in Codfw * Eqiad: https://grafana.wikimedia.org/d/HIRrxQ6mk/o... [16:17:55] 10ORES, 10Scoring-platform-team (Current), 10Operations: Ores celery OOM event in codfw - https://phabricator.wikimedia.org/T242705 (10Halfak) When I start up the deployment ORES config locally with 4 workers, I can see that we are using ~2516000 bytes of RES for two processes. It looks like my available RA... [16:51:34] kevinbazira, looks like we have a monthly staff meeting today. Let's do async standup. [16:53:45] o/ accraze [16:53:52] staff meeting today so I'm async'ing early. [16:53:54] Y: Mostly meetings. Made some progress on topic docs. I should be able to start uploading some figures later today [16:53:54] T: There's a memory pressure issue in production ORES. I did a rollback to bring things back to sanity and sent some emails to the Growth team. I'm digging into it now. I think we have an easy win at our disposal. Our strategy for dropping the heavy modeling bits from the wsgi workers (because they don't need them) isn't working so I'm figuring out why now. If I make good progress on that, I'll continue with topic docs. [16:54:22] sounds good halfak [16:58:31] Y: finished moving all hardcoded strings in dialog popups/forms/placholder text to use i18n format. everything in the entity UI should now be wired up for translation. [17:01:19] T: [17:01:20] TL;DR: Fixed line exceeds 100 characters in EntityContent.php Since the main culprit has been fixed. Please advise on how you would squash other commits into this update. [17:01:20] Long Version: Tried to fix commits that have "Verified -1". in /Jade root. run `composer test` and got; "Script phpcs -s -p handling the test event returned with error code 3" then run composer fix`and got; "Script minus-x fix . handling the fix event returned with error code 127" then I run `grunt` and got; "Fatal error: Unable to find local grunt." [17:01:20] [17:01:22] on gerrit. I decided to review the details displayed at the bottom on one of the commits: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Jade/+/570266 This led me to this page: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-docker/44758/console Where I found: "WARNING | Line exceeds 100 characters; contains 110" then went to EntityContent.php and fixed this issue with this commit: https://gerrit.wikimedia.org/r/ [17:01:29] c/mediawiki/extensions/Jade/+/570630 that now has "Verified +2" [17:01:31] Just like Andy said via email, it looks like #151 EntityContent.php is the one causing some commmits to fail on Jenkins. Now that #151 has been fixed, is there a way I can squash all the failing commits into the one that is passing? Kindly advise on how you would go about this. Thanks! [17:03:05] T: Need to start reviewing Kevin's patchsets. I'd also like to get some review on my entity ui patchset and merge so we can get it to beta -- this will also make merging Kevin's changes much easier. Need to look into icon color stuff a bit too. Also going to start on documentation stuff today (readme/install docs) [17:05:10] kevinbazira: yeah I think some of the confusion is due to the incorrect info I gave you to create a dependant patchset, I'm taking a look at your code, will send you some more info shortly [17:05:37] Thanks accraze, that will be grate :) [17:27:53] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MassMessage, and 13 others: Api tests: Hard deprecate $this->doLogin, remove calls in favor of passing a user where needed - https://phabricator.wikimedia.org/T244039 (10DannyS712) [18:03:37] accraze, I could use some help thinking through something. I don't know how crazy it is. Got a few minutes to chat? [18:04:02] This is re. the prod memory issue. [18:06:41] * halfak thinks out loud for a moment. [18:07:15] OK so. We have two main processes from which all other uwsgi processes spawn from. There's a lot of memory sharing in those forks so I'm really primarily concerned with these main two. [18:07:41] When I started working today, I saw 2.5GB of RES for each process. ~5GB total. [18:07:52] This matched a loss in my available RAM as reported by top. [18:09:09] I've figured out a strategy of loading models using a multiprocessing process (fork mode), trimming the bits we don't need, and then returning (via pickle serialization) just those bits to the main process. This has resulted in a substantial drop in memory usage. [18:09:23] ah good you are in that loop [18:09:32] I meant to mention it here yesterday but got sucked into code [18:10:20] heh. Yeah. This has been my morning :) [18:10:52] err.. 'enjoy'? :-D [18:10:52] Oh wait. I'm not sure I'm seeing that substantial drop anymore. [18:11:16] hmmm i was going to say that sounded like a reasonable strategy halfak [18:11:59] long term though, maybe we look into leveraging asyncio [18:12:05] I might have spoke too soon. Upon review, I'm not sure there was much recovery of memory. [18:12:09] Oh? How much that help us? [18:12:13] *might [18:13:22] Oh wait. I was reading it right. We dropped memory per worker down to 1.7GB [18:13:33] So ~3.4GB total from 5GB. [18:14:03] In theory, this is all stupid and should have happened anyway. But it looks like python garbage collection wasn't doing it's job. [18:14:58] Another issue we have is that model_info takes up a lot of space. We don't use that on the celery-side so I'd like to trim that as well. [18:16:28] ahh yeah, asyncio might not help all that much but it would allow for concurrency in a single thread which might reduce memory there [18:17:35] For uwsgi, I'd be really interested in switching to thread-based concurrency to save on memory but yuvipanda -- our old SRE sage -- advised us against using threads in uwsgi. [18:17:38] uwsgi is IO bound only. [18:19:46] makes sense, maybe we could look into moving to asgi [18:20:06] i've heard good things about starlette being an async flask: https://github.com/encode/starlette [18:28:02] Hmm. Interesting. [18:28:13] It does seem that flask is weird. [18:33:22] Just jumped into a meeting but I think I can do some more work before I need a hand. [18:33:39] This strategy of loading things in with a parallel process provides a nice, clean strategy for looking at memory usage. [18:34:00] I can load something in, slice parts off, and then drop the process to see what happens. [19:50:41] Ho. Lee. Crapola. I got out uwsgi workers down to 400M [19:50:43] \o/ [19:51:27] Now to finish off the celery worker bits and see if this thing still actually works! [19:54:22] nice! [19:54:44] Looks like celery still needs a lot of memory. [19:54:48] But I can trim a bit. [19:57:07] It works! [20:00:08] I can fit both celery and uwsgi in 3.5GB when they would take 9GB together before. [20:00:20] Not sure how massive forking will work out. We'll see! [20:06:40] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10MMiller_WMF) @kostajh -- thanks. I made the sheet public. [20:12:28] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10MMiller_WMF) @Tgr -- if we don't implement alphabetical sorting, what will the sorting be governed by? [20:33:58] 10Scoring-platform-team, 10Discovery-Search, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10Tgr) Currently we use the same order the topics are listed on the configuration page; we could always keep doing that. That gives the communit... [20:39:29] https://github.com/wikimedia/ores/pull/337 [20:39:30] BAM [20:40:24] wikimedia/ores#1392 (memory_sensitive_models - f97731b : Aaron Halfaker): The build failed. https://travis-ci.org/wikimedia/ores/builds/647044639 [20:40:30] Blah [20:41:25] :( [20:41:51] brb makin food [20:42:13] wikimedia/ores#1393 (memory_sensitive_models - f3b1ca6 : halfak): The build failed. https://travis-ci.org/wikimedia/ores/builds/647045229 [20:43:57] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10EBernhardson) [20:44:01] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10EBernhardson) [20:57:03] Weird. Tests pass on my machine. [20:57:06] * halfak digs [21:22:44] Aha! Werzerg is being weird. [21:30:28] Looks like they went to 1.0.0 and one of our libraries didn't gate against pulling that down and assuming it would work :| [21:30:31] So we get to gate it now. [21:30:50] wikimedia/ores#1395 (memory_sensitive_models - f69989c : halfak): The build was fixed. https://travis-ci.org/wikimedia/ores/builds/647064562 [21:30:53] Ta da! [21:33:51] wikimedia/ores#1397 (master - b2180fa : Aaron Halfaker): The build was broken. https://travis-ci.org/wikimedia/ores/builds/647065506 [21:33:56] wat. [21:34:07] lol [21:34:20] I merged someone else's PR and it broke master for the same reason that my branch was broken. [21:34:27] Fun story, merging my PR will also fix master [21:41:46] accraze|lunch, for when you're back: https://github.com/wikimedia/ores/pull/337 [22:21:35] takin a look now halfak [22:22:21] Awesome! Thanks [22:30:28] merged it halfak, looks nice & modular [22:33:35] wikimedia/ores#1400 (master - 677f165 : Andy Craze): The build was fixed. https://travis-ci.org/wikimedia/ores/builds/647089587 [22:37:48] WOO! [22:37:49] Thanks [22:38:02] I'll see if I can get this to beta. [22:40:35] * halfak waits for gerrit to get it together [22:40:47] I think I might have to wait until tomorrow morning. [22:40:50] have a good one! [22:40:54] thanks again accraze :) [22:40:55] o/ [22:41:22] later halfak! [22:53:04] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10Tgr) If there are a lot of articles, precision won't really matter since the articles will be sorted by score... [22:59:13] 10Scoring-platform-team, 10Discovery-Search, 10Growth Design, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10RHo) [23:08:49] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10Tgr) What are the thresholds for, exactly? The `articletopic:` search keyword in general, or just the suggest... [23:48:15] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10Tgr) In theory there are two mappings: # map the ORES taxonomy to topic search keywords # map topic search keyw... [23:49:43] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10Tgr) Also, just to confirm, the mapping (and ORES taxonomy) will be identical across all wikis, right?