[06:50:05] hey halfak just so you know, the ores 'oom on log rotation' issue is still around, happened again today [07:30:17] (03CR) 10DannyS712: [C: 03+1] "Not sure why L10n-bot isn't giving a +2, looks fine" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/571656 (owner: 10L10n-bot) [08:38:25] 10ORES, 10Scoring-platform-team (Current), 10Operations: Ores celery OOM events in all hosts - https://phabricator.wikimedia.org/T242705 (10akosiaris) Seems like the deploy did not fix it after all. Most (if not all) hosts alerted this morning. It's evident in the graphs as well https://grafana.wikimedia.or... [13:58:35] Woah. The OOM thing seems to be unrelated to ORES code then. We freed up a *a lot* of memory [13:59:48] akosiaris, I'm wondering what your thoughts are here? Did we get bit by some software update? [14:17:40] E.g. uwsgi changed its behavior. [15:02:40] Indeed a lot of memory was freed per https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=ores&var-instance=All&from=now-30d&to=now&fullscreen&panelId=86. However on stopping uwsgi seems to consume all that freed memory [15:03:02] akosiaris, were you able to determine when that started? [15:03:20] I wonder if we have a bunk version of uwsgi or something? I'm just trying to figure out what changed. [15:03:40] last time uwsgi was upgraded was in Dec 10 from what I see. [15:03:45] so close to 2months [15:04:09] it would have triggered way before Feb 2 (that was the first such occurrence I think?) [15:04:18] Gotcha. [15:04:41] why is the uwsgi worker anyway consuming memory in the first place I 'd ask? [15:05:00] Something really weird goes on there during the stop phase (i.e. receiving the TERM signal) [15:05:30] also it's consuming 100% for ~20s [15:05:41] what is it doing during all that time? [15:06:04] No idea. [15:06:10] Shouldn't be doing anything. [15:06:18] is it reproducible in labs as well? or locally? [15:06:25] Are you sure it is the stop phase? [15:06:42] it definitely happens when I issue systemctl stop uwsgi-ores [15:06:44] The only time CPU and mem should jump is if it is loading models/assets again. [15:06:50] Oh! [15:06:56] So it doesn't even need to be a reload. [15:07:09] let me try locally. [15:07:37] if you run top or htop on one terminal while running systemctl stop uwsgi-ores on another you will definitely see it [15:08:03] unless it's some weird byproduct of something else local to all those boxes, which is why I am asking if it's reproducible in labs or elsewhere [15:08:21] I'll check out labs today. [15:34:03] akosiaris, can replicate in labs. [15:34:10] WTF [15:36:22] So I know it's not the listen queue filling up. [15:36:37] https://github.com/unbit/uwsgi/issues/296 [15:37:21] "My temporary workaround is to use the --ignore-sigpipe argument, but perhaps there is a better way." [15:44:37] It worked! [15:44:45] akosiaris, ^ if you're still around. [15:45:22] Wait. [15:45:23] No. [15:45:26] Arg. [15:45:29] Nevermind. :( [15:45:33] I got one clean reboot. [15:45:33] * akosiaris around however :-) [15:45:35] Something happen. [15:58:57] I can't figure out an ORES deploy that might be relevant. [15:59:13] Looking at the timeline of ORES deploys, nothing jumps out. [15:59:38] I am suspicious of the libraries that are part of Flask -- as they provide the wsgi intertface. [16:35:27] 10Scoring-platform-team, 10Discovery-Search, 10Epic, 10Growth-Team (Current Sprint): [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics) - https://phabricator.wikimedia.org/T240517 (10TJones) [19:15:20] This is the most relevant bit of info I can seem to find. [19:15:31] Regretfully the work-around described here doesn't work for us. [19:15:52] But I wonder what it would take to get a backtrace like the one in the issue. [19:22:47] what's the link halfak? [19:22:59] https://github.com/unbit/uwsgi/issues/296 [19:23:01] Woops. [19:23:07] Forgot to paste. [19:23:20] 10Scoring-platform-team, 10Discovery-Search, 10Growth Design, 10Growth-Team (Current Sprint): Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 (10MMiller_WMF) [19:26:49] 10Scoring-platform-team, 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10Tgr) >>! In T244297#5858563, @EBernhardson wrote: > Another concern i just realized with respect to threshold... [19:35:20] halfak, whats the version for uswgi? [19:35:28] maybe we could use something like https://uwsgi-docs.readthedocs.io/en/latest/Tracebacker.html [19:35:36] 2.0.14-debian [19:36:14] Ooh interesting [19:37:03] I confirmed that code going back to Aug 2nd displays the same behavior. [19:37:59] 10ORES, 10Scoring-platform-team (Current), 10Operations: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) [19:43:20] 10ORES, 10Scoring-platform-team (Current), 10Operations: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) OK so I've done some tests. It's clear that we can see this CPU/memory spike when shutting down uwsgi... [19:52:32] 10ORES, 10Scoring-platform-team (Current), 10Operations: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) From https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html > To shutdown uWSGI use SIGINT or S... [19:56:46] 10ORES, 10Scoring-platform-team (Current), 10Operations: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) I tried removing the `--die-on-term` option and I get the same behavior. [20:22:21] Arg. I can connect to the dang sockets but I get no output at all. [20:35:28] 10ORES, 10Scoring-platform-team (Current), 10Operations: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) Here's an `strace` of one of the child processes that goes berserk: ` strace: Process 25405 attached... [21:28:40] 10Jade, 10Scoring-platform-team (Current), 10Patch-For-Review: Jade local dev setup / README docs - https://phabricator.wikimedia.org/T244152 (10ACraze) @kevinbazira @Halfak, I just pushed up a patchset with a README that has local dev install docs. Let me know if you notice anything missing! [21:33:16] hello @bal [21:33:35] hello @halfak [21:34:14] I asked some questions yesterday regarding the tags, and titles. [21:34:58] i know the tags are things like and titles are things like [[File:]] [21:34:59] 10[1] 04https://meta.wikimedia.org/wiki/File: [21:35:32] but I don't know what {{ qualifies as. its used for the info boxes. [21:35:37] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-Special-pages, and 2 others: SpecialRecentChanges::doMainQuery needs tunning - https://phabricator.wikimedia.org/T244569 (10Halfak) [21:36:18] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-Special-pages, and 2 others: SpecialRecentChanges::doMainQuery needs tunning - https://phabricator.wikimedia.org/T244569 (10Halfak) Adding #Growth-Team because this is related to their RecentChanges Filters. [21:37:15] 10Scoring-platform-team (Current), 10Discovery-Search (Current work), 10Growth-Team (Current Sprint), 10Patch-For-Review: Newcomer tasks: set initial thresholds for ORES articletopic - https://phabricator.wikimedia.org/T244297 (10Halfak) [21:37:44] 10Scoring-platform-team (Current), 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10Halfak) [21:38:26] 10Scoring-platform-team (Current), 10Discovery-Search (Current work), 10Growth-Team (Current Sprint): Newcomer tasks: ORES ontology mapping and score thresholds - https://phabricator.wikimedia.org/T244192 (10Halfak) Looks like this is done from the #scoring-platform-team side. Let me know if you need anythi... [21:40:12] 10Scoring-platform-team, 10NewcomerTasks 1.1, 10Growth-Team (Current Sprint): Define configuration for ORES drafttopic search - https://phabricator.wikimedia.org/T243359 (10Halfak) The models use the exact same names in other wikis. There will need to be a localized mapping at the UI level. [21:42:10] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Implement native NN model in revscoring - https://phabricator.wikimedia.org/T242013 (10Halfak) [21:42:34] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Implement native NN model in revscoring - https://phabricator.wikimedia.org/T242013 (10Halfak) [21:43:13] 10Scoring-platform-team (Research), 10revscoring, 10artificial-intelligence: Implement native NN model in revscoring - https://phabricator.wikimedia.org/T242013 (10Halfak) [21:44:33] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Move fiwiki from custom to config-based Makfile - https://phabricator.wikimedia.org/T221640 (10Halfak) p:05Medium→03Low [21:44:48] 10Scoring-platform-team (Current), 10articlequality-modeling, 10artificial-intelligence: Improve ORES articlequality feature extraction for images - https://phabricator.wikimedia.org/T180822 (10Halfak) [21:45:01] o/ haksoat! [21:45:09] SOrry I missed you yesterday. [21:45:18] {{ is how a template starts. [21:45:42] TRhis is a good reference: https://www.mediawiki.org/wiki/Help:Formatting [21:46:46] Thanks [21:47:40] 10Scoring-platform-team, 10WMF-Legal, 10Wikilabels: Add legal language to wikilabels - https://phabricator.wikimedia.org/T223313 (10Halfak) Ping. [21:47:52] 10Scoring-platform-team, 10WMF-Legal, 10Wikilabels: Add legal language to wikilabels - https://phabricator.wikimedia.org/T223313 (10Halfak) p:05High→03Low [21:50:01] 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Review prometheus ORES rules for completeness - https://phabricator.wikimedia.org/T233448 (10Halfak) Should we do the cleanup that I proposed above? [21:51:40] 10Scoring-platform-team, 10drafttopic-modeling, 10Patch-For-Review: Key-value extraction misses on Wikipedia:WikiProject Council/Directory/WikiProject template invocations - https://phabricator.wikimedia.org/T229401 (10Halfak) 05Open→03Declined Thanks for your work on this @dr0ptp4kt, since we transition... [21:55:23] 10Scoring-platform-team, 10drafttopic-modeling: Add topic information to the ores-support-checklist - https://phabricator.wikimedia.org/T245068 (10Halfak) [21:55:31] 10Scoring-platform-team, 10drafttopic-modeling: Add topic information to the ores-support-checklist - https://phabricator.wikimedia.org/T245068 (10Halfak) [21:55:33] 10Scoring-platform-team, 10ORES-Support-Checklist: Document and share operational details of ores-support-checklist - https://phabricator.wikimedia.org/T222271 (10Halfak) [21:56:17] 10Scoring-platform-team (Current), 10drafttopic-modeling, 10revscoring, 10artificial-intelligence: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. - https://phabricator.wikimedia.org/T235183 (10Halfak) [21:56:28] 10Scoring-platform-team (Current), 10drafttopic-modeling, 10revscoring, 10artificial-intelligence: Build WikiProject directory topic models for ar, cs, and kowiki - https://phabricator.wikimedia.org/T235181 (10Halfak) [21:56:30] 10Scoring-platform-team (Current), 10drafttopic-modeling, 10revscoring, 10artificial-intelligence: Experiment with different vector lengths for ar, cs, en, and kowiki topic models. - https://phabricator.wikimedia.org/T235183 (10Halfak) 05Open→03Resolved a:03Halfak [21:57:48] 10Jade, 10Scoring-platform-team (Current), 10Regression: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith} - https://phabricator.wikimedia.org/T210804 (10Halfak) [21:58:13] 10Jade, 10Scoring-platform-team (Current), 10Regression: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith} - https://phabricator.wikimedia.org/T210804 (10Halfak) 05Open→03Resolved [21:59:22] 10Jade, 10Scoring-platform-team, 10I18n: Content quality scale translatable strings might not work as implemented - https://phabricator.wikimedia.org/T209884 (10Halfak) Enwiki has a 6 item scale Wikidatawiki has a 5 item scale. How do we make sure this renders correctly when they are both using the `conte... [22:08:06] 10Jade, 10Scoring-platform-team (Current), 10Documentation, 10Patch-For-Review: Jade local dev setup / README docs - https://phabricator.wikimedia.org/T244152 (10ACraze)