[05:44:02] 10Jade, 10Scoring-platform-team, 10Regression: Regression: Judgment validation allows for multiple judgments with the same value e.g. 2x {damaging, badfaith} - https://phabricator.wikimedia.org/T210804 (10DannyS712) [16:09:43] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MassMessage, and 12 others: Api tests: Hard deprecate $this->doLogin, remove calls in favor of passing a user where needed - https://phabricator.wikimedia.org/T244039 (10DannyS712) [17:43:41] 10Scoring-platform-team, 10drafttopic-modeling, 10revscoring, 10artificial-intelligence: Implement native NN topic model in revscoring - https://phabricator.wikimedia.org/T242013 (10Halfak) a:05kevinbazira→03None [17:44:37] 10Scoring-platform-team (Current), 10Wikilabels: Wikilabels docs -- Make install docs better - https://phabricator.wikimedia.org/T244151 (10Halfak) [17:45:50] 10Jade, 10Scoring-platform-team (Current): Jade local dev/ README - https://phabricator.wikimedia.org/T244152 (10ACraze) [17:47:09] 10Jade, 10Scoring-platform-team (Current): Jade local dev/ README - https://phabricator.wikimedia.org/T244152 (10ACraze) [17:47:11] 10Jade, 10Scoring-platform-team, 10Epic, 10Goal: Complete Jade documentation - https://phabricator.wikimedia.org/T229967 (10ACraze) [17:50:07] 10Jade, 10Scoring-platform-team (Current): Jade local dev/ README - https://phabricator.wikimedia.org/T244152 (10ACraze) [17:50:42] 10Jade, 10Scoring-platform-team (Current): Jade local dev setup / README docs - https://phabricator.wikimedia.org/T244152 (10ACraze) p:05Triage→03Low a:03ACraze [17:51:51] 10Jade, 10Scoring-platform-team (Current): Jade local dev setup / README docs - https://phabricator.wikimedia.org/T244152 (10ACraze) [19:41:35] quesiton, does LogEvents have dump? https://www.mediawiki.org/wiki/API:Logevents [19:41:39] for download [19:44:55] xinbenlv, yes, but it's SQL, I think [19:44:57] * halfak digas [19:45:41] Aha! It is XML: https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-logging.xml.gz [19:46:05] And MWXML supports it! [19:46:21] https://pythonhosted.org/mwxml/iteration.html#mwxml.LogItem [19:49:18] that's awesome~ [20:34:54] halfak: I've been trying to write a utility to extract data using api and multiprocess it as here - https://github.com/wikimedia/revscoring/blob/master/revscoring/utilities/extract.py However, I'm facing a classic error when multiprocessing - "TypeError: can't pickle _thread.RLock objects" [20:35:04] Em... [20:35:07] do you have any insights on any obvious mistake i may have committed here [20:36:04] halfak, is there a code example of querying the log dump? If not, I may write one [20:46:10] Hmm. I'll see if I can make a run. [20:46:24] Could be a bug. I don't think the log dump gets much attention [20:48:56] the code snippet is sth like this - http://dpaste.com/10HH1K6 [20:50:39] Hmm. It does work in a basic way. [20:50:43] Checking your snippet. [20:52:10] Oh! ha. OK this has nothing to do with mwxml [20:52:28] I'm looking at your extract code. [20:52:35] Can you paste the full error? [20:54:14] I don't see any obvious problems. [20:54:31] I'm guessing that you have something that pickle can't handle in your results. [20:54:49] It's possible that there's something else in the error message that might give us a clue. [20:54:52] codezee, ^ [20:55:34] I'll make a code example for the log dumps for xinbenlv :) [20:58:34] xinbenlv, https://gist.github.com/halfak/fd2779c38b94d41b10cf3c94f5d42022 [20:58:38] That should help you get started. [21:00:53] halfak: http://dpaste.com/08XK05V here's the error [21:01:57] you can ignore the initial warnings [21:02:45] looking at stackoverflow clearly its a pickling problem with what i'm sending to processes but just like your code i'm sending a collection of revids dicts [21:03:16] https://stackoverflow.com/questions/44144584/typeerror-cant-pickle-thread-lock-objects [21:03:54] Aha! It could be the logger. [21:05:48] oh i think it WAS the logger :) thanks [21:08:43] https://github.com/halfak/python-para/blob/master/para/map.py#L185 [21:08:45] Check this out. [21:08:51] It's how I got around this elsewhere [21:09:07] I made a special logger queue process that would collect all of the logging and output it. [21:09:52] oh, thats so cool! looking [21:16:41] ORES deploy was successful! [21:25:42] And the user scripts work! Awesome. [21:25:49] Sending the good news to the Growth team. [21:26:13] This might just wrap up the engineering work from our side. I'd like to write a blog post about this. [21:32:15] nice one halfak [21:33:45] Memory usage went up. We're still pretty safe, but I don't like it. I'm guessing our shared memory isn't working as well as would have hopes despite the mmap. [21:34:06] We have ~10GB memory available on all of the nodes. [21:34:47] Down from 20GB available. [21:35:51] In theory, we have fewer floats in memory. 50 * 5 * 100k < 300 * 1 * 150k [21:36:09] We should have less than half. But we have a lot of words and those strings take up space. [21:36:30] 500k words mapped to vectors vs. 150k words mapped to vectors. [21:41:09] On final note. ORES finally thinks that Ann Bishop is a woman (97%) and that Alan Turing is not (2%) [21:41:13] \o/ [21:42:04] Also, Aaron Halfaker is probably a person, but the thing we're most sure of is that he's "Internet Culture". [21:51:59] lol [21:55:50] That's awesome [23:30:03] halfak, so that's all the historical logs right?