[11:23:06] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10WMSE-Development-Support-2018 (Support-for-fighting-vandalism): The filters menu is closed when a filter colour is selected - https://phabricator.wikimedia.org/T188903#4023295 (10Sebastian_Berlin-WMSE) [11:27:05] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10WMSE-Bug-Reporting-and-Translation-2018: The filters menu is closed when a filter colour is selected - https://phabricator.wikimedia.org/T188903#4023308 (10Sebastian_Berlin-WMSE) [15:13:26] 10Scoring-platform-team, 10monitoring: Create Iginca alerts for dead celery service (pool of workers) - https://phabricator.wikimedia.org/T188282#4023861 (10Halfak) [15:14:08] 10Scoring-platform-team, 10ORES, 10Documentation: Write an ORES manual - https://phabricator.wikimedia.org/T188280#4023864 (10Halfak) See notes from ongoing meeting here: https://etherpad.wikimedia.org/p/ores_docs_refactor [15:47:47] 10Scoring-platform-team (Current), 10Packaging, 10Epic: [Epic] Support word2vec for production ORES models - https://phabricator.wikimedia.org/T187217#4023904 (10awight) a:05awight>03None [15:49:02] 10Scoring-platform-team (Current), 10JADE, 10Patch-For-Review: [Blocked] Deploy JADE prototype in Beta Cluster - https://phabricator.wikimedia.org/T176333#4023909 (10awight) [15:49:09] 10Scoring-platform-team (Current): Scoring platform team FY18 Q2 - https://phabricator.wikimedia.org/T176324#4023911 (10awight) [15:49:13] 10Scoring-platform-team (Current), 10JADE, 10Patch-For-Review: [Blocked] Deploy JADE prototype in Beta Cluster - https://phabricator.wikimedia.org/T176333#3621812 (10awight) 05Open>03stalled [15:56:36] halfak: i have not caught dinner yet and its late here, so i'll be hopping off :P [15:56:45] i'll be back in an hour or so [16:01:06] 10Scoring-platform-team (Current), 10Packaging: Package word2vec binaries - https://phabricator.wikimedia.org/T188446#4023948 (10awight) [16:01:26] 10Scoring-platform-team (Current), 10Packaging: Package word2vec binaries - https://phabricator.wikimedia.org/T188446#4007994 (10awight) @akosiaris I started outlining our options here, please chime in when you can. [16:30:15] Swinging past that bike store again… back in 20-30 [16:46:56] Amir1, FYI there's a reboot coming for stat1005/6 on Wednesday. [16:56:49] I'm going to make a quick trip to the store. There's a crazy blizzard coming. [16:56:52] And I'm [16:56:59] going to stock up a bit. [16:57:16] Everything is closed today :| Hopefully the grocery store is open. [17:34:26] 10Scoring-platform-team (Current), 10Packaging: Package word2vec binaries - https://phabricator.wikimedia.org/T188446#4024295 (10akosiaris) So basically we are talking about 3 choices, all of them undesirable by at least a person, if not more. Technically speaking and assuming a bug free world, the best one s... [17:52:28] 10Scoring-platform-team (Current), 10Packaging: Package word2vec binaries - https://phabricator.wikimedia.org/T188446#4024354 (10mmodell) >>! In T188446#4024295, @akosiaris wrote: > So basically we are talking about 3 choices, all of them undesirable by at least a person, if not more. > > Technically speaking... [18:07:30] 10Scoring-platform-team (Current), 10Packaging: Package word2vec binaries - https://phabricator.wikimedia.org/T188446#4024453 (10awight) Well... @akosiaris what would you think about using the word2vec.deb provisionally, with a medium-term commitment to move to git-lfs/fat once we can prove that the deployment... [18:17:31] o/ [18:17:38] Back from the store but in a meeting. [18:25:29] And now maybe lunch. [18:25:34] Anything for me to look at over lunch? [18:26:18] halfak|Lunch: the ram after starting extraction continuously moves up reaching max until OS kills process [18:26:28] i saw it reaching ~ 16/16GB [18:26:41] codezee, gotcha. I'll look at how we have vectors implemented and consider. [18:26:45] We probably have a leak. [18:27:00] Just to confirm, is there any output before it crashes? [18:27:02] codezee, ^ [18:27:18] halfak|Lunch: isn't it bec we're storing vectors in the dependent cache? the problem [18:27:26] codezee, depends [18:27:39] It's one consideration. [18:27:49] i mean we do the average in a sepearate dependency so basically for every document's every word there's a vector there [18:28:43] right. [18:28:48] 300 floats is pretty small. [18:28:54] is there any output before it crashes? [18:29:02] ^ this is the key question [18:29:37] 93,000*300*1024*(100words/doc) lets say, we get 10GB already [18:29:54] sorry 93,000*300*4*100 [18:30:03] is there any output before it crashes? [18:30:04] if we assume 100 words /doc [18:30:07] no output [18:30:13] OK thanks. WIll consider [18:30:58] this is the script - https://github.com/wiki-ai/drafttopic/blob/extract-from-text/drafttopic/utilities/extract_from_text.py [18:42:03] if OOM is the case, i can write a new feature that does not store caches like the first version [19:24:36] I just ran a test where I loaded 100 * 300 value floats lists into memory and my RES usage went up by less than 1k. [19:24:47] 11692 --> 12480 [19:25:27] I loaded 1000 and it went up by 10k. (22064) [19:25:57] I loaded 10,000 vectors and it went up by ~100k (111388) [19:26:05] So I don't think that is the issue. [19:26:12] 10,000 non-stopwords is a pretty big article. [19:26:28] codezee, I think there's something else going on. [19:34:33] Those are MB not KB? [19:35:06] Oh could be. [19:35:55] Hm... I don't think that is right. [19:36:13] * halfak tries to check. [19:36:23] Not that 100MB is 32GB [19:36:29] err 16GB [19:36:31] * awight rubs eyes [19:38:35] If this is MB, then python requires 9.4MB just to start up the interpreter. [19:40:13] Sounds right ;-) [19:40:25] what tool are you using, `top`? [19:40:34] ps [19:40:42] Now I'm using sys.getsizeof [19:40:45] To double-check [19:41:34] the man page is saying KiB [19:41:39] Gotcha [19:42:20] Based on python's sys.getsizeof(), we should expect a list of 10,000 300-length vectors of floats to consume 25.3MB of memory. [19:42:38] whoa [19:43:01] yeah I was just seeing that 300 x 100 x anything is >> 1k [19:43:14] so the increase is probably not the vectors themselves [19:43:19] getsizeof() isn't recursive [19:43:22] So you need to be clever [19:43:50] yuck [19:44:06] OK I don't have time to profile now because I need to work on the "killer deck" [19:44:23] But my next step would have been to manually run tests with the feature vector itself. [19:44:32] Using revscoring's solve() method. [19:44:42] This is a sledgehammer, for another day, but: https://phabricator.wikimedia.org/T182350 [19:45:03] Links to a tool, https://pypi.python.org/pypi/memory_profiler [20:00:33] well, per document its fine, but if its maintaining a collection of documents in memory during solve, we have a problem there [20:01:07] although thats not whats happening in the script i think [21:46:42] Heading to the bike shop again... [21:46:45] back in 30 [21:47:03] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/JADE] - 10https://gerrit.wikimedia.org/r/416550 (owner: 10L10n-bot) [21:51:08] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/416561 (owner: 10L10n-bot) [22:58:24] OK I'm out of here for the evening. [22:58:30] Have a good one, folks. :) [22:58:53] same