[01:23:58] wiki-ai/wb-vandalism#98 (user_features - 1659ce1 : halfak): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/92180848 [01:37:31] * halfak extracts the user features into the wikidata model [06:01:38] AUC = 0.8467 for Wikidata with user.age [17:19:33] ok, I connected using chatzilla [17:19:39] tired of Konversation.... [17:19:59] aetilley, aetilley` hey, can you check my PR? [17:30:40] ok [17:30:42] Amir1: [17:30:44] ^ [17:30:52] thanks :) [17:30:55] one sec [17:31:26] halfak yo [17:31:42] Hey! On my way [18:59:47] I'm back [18:59:52] halfak: around? [18:59:53] o/ Amir1 [19:00:05] I'm in a workshop now,, but this isn't totally new stuff to me [19:00:09] aetilley: hey, You had a question about my PR [19:00:09] let's work halfak [19:00:18] You have 20% of my attention ;) [19:00:26] :D [19:00:47] Just tell me where are you now about the global age and I will do the rest [19:01:12] Oh! Sure. So, there's an API end point for this. [19:02:13] E.g. https://en.wikipedia.org/w/api.php?action=query&meta=globaluserinfo&guiuser=Example&guiprop=groups|merged|unattached [19:02:43] We can have features for # of wikis, total edit counts, home wiki, earliest registration, etc :D [19:03:22] hmm, okay [19:03:23] Blocks on other wikis might be a really good one. [19:04:23] You will have it very soon [19:05:34] Cool! :) [19:08:53] Amir1: Hi, no questions, just merged. [19:09:23] \o/ [19:19:37] halfak: So about my cryptic comment about posting the sigclust repo.... [19:20:04] Sorry if that didn't make sense. I just wasn't sure if there would be any legal issues that I needed to consider [19:20:13] I'm hoping there are none such. [19:20:40] But Marron was kind enough to help, it would seem a good idea to at least let him post the repo. [19:43:47] halfak: I got disconnected, did you get my question? [19:44:48] based on logs you didn't [19:44:56] Amir1 halfak: one last thing, should we implement this in revscoring and then use it in wb-vandalism or just implement and use it in wb-vandalism [19:44:58] Amir1 I think using it in revscoring can be beneficial to other wikis as well [19:46:01] * halfak has no idea what legal issues aetilley is concerned about. [19:46:28] Amir1, eventually in revscoring [19:46:45] But if you want to experiement in wb-vandalism, it should be easy to move. [19:46:51] IMO, going straight to revscoring makes sense. [19:47:31] ‌testing it in wb-vandalism is harder since we don't anything user-related in them (like datasources, etc.) [19:47:44] *dont' have [19:49:00] Amir1, can always import from revscoring, but I agree that just building it in 'revscoring' makes more sense. [19:50:00] halfak: Sorry, accidentally quit. [19:51:24] No worries. Not sure what you mean re. legal issues. [19:51:49] Did you base your algorithm on the R library? [19:51:55] If so, how is that licensed? [20:00:20] I based my algorithm on the paper, but I did look at the R sigclust code in help implimenting the soft-thresholding method. [20:00:51] Ok, the fact that you don't know what I'm asking confirms that I'm just being paranoid. [20:01:34] I remember reading something in our contract about everything must be published open source but there was some footnote about wikipedia policies and exceptions. [20:01:45] one sec [20:03:51] From the IEG page: [20:03:56] "Any code or other materials produced must be published and released as free and open-source. Licensing should be compatible with current Wikimedia and MediaWiki practices." [20:04:21] https://meta.wikimedia.org/wiki/Grants:IEG#ieg-learn [20:04:39] Oh yes. We are required to release an open source library. [20:04:49] Attribution and licensing might depend on context. [20:06:55] Ok, all I have right now is: [20:06:57] "The original version of this code sprung out of interest in clustering Wikipedia article revisions whi\ [20:07:02] le funded by and Individual Engagement Grant from the Wikimedia Foundation." [20:07:28] Should be 'an' not 'and'... [20:08:58] ah, that's all. [20:25:37] That's fine to include, but totally not necessary, [20:25:55] I'd like to just slap an MIT License on it and otherwise just describe what it is and how it works. [20:29:44] halfak: I wanna merge https://gerrit.wikimedia.org/r/#/c/254119/ which might disrupt ores for a few seconds. that ok to do now or should I wait? [20:30:18] YuviPanda, how will the disruption work? [20:30:22] Reboot of the redis server? [20:30:33] restart the redis process, yah [20:30:37] I just did it for quarry [20:32:26] Hmm... [20:32:38] So, we might have requests to redis hang for a moment? [20:32:42] they'll fail [20:32:45] not hang [20:32:48] Gotcha. [20:33:23] Yeah. Let's do it. [20:33:23] I wonder if we can use this as an opportunity to do a failover but that's definitely way more risky [20:33:28] * halfak pulls up graphite to monitor [20:33:30] for redis, esp. [20:34:09] halfak: Ok. Let me know if there's any specific action you want me to take. ttyl [20:34:25] halfak: let me know when we're good to go! [20:34:33] will do aetilley [20:34:37] YuviPanda, sure one min [20:34:40] kkk [20:36:41] * halfak sets up graph [20:36:50] no "kkk" :P [20:36:57] OK. We're good to go. [20:37:28] ^ YuviPanda [20:37:54] okkk :) [20:39:51] it's happening [20:40:30] done [20:40:32] halfak: ^ [20:41:11] Looks like we're doing fine. [20:41:18] I didn't even see any errors on graphite. [20:41:32] Oh... Internal Server Error [20:41:34] :( [20:41:42] oh [20:41:49] Now all scores are timing out [20:41:56] Restart the workers? [20:42:07] I see a lot of activity in redis from the workers [20:42:17] (monitor is active) [20:42:36] but yeah let me restart web nodes first? [20:43:21] yes [20:43:44] doing so [20:43:56] this takes forever ofc :| [20:44:15] yeah... [20:45:57] halfak: any better now? [20:46:00] Nope [20:46:26] I've restarted all of the workers. [20:46:42] Oh! We're back [20:46:55] halfak: ok! [20:47:08] I also restarted all the workers [20:47:10] err [20:47:10] And it looks like our error log in graphite is the dumb :( [20:47:12] web nodes [20:47:31] let me look at worker error logs now [20:47:58] I've got to run. [20:48:06] University --> Home. [20:48:11] I'll be back on in ~1 hour [20:48:22] > Nov 20 20:44:37 ores-worker-01 celery[7368]: mwapi.errors.ConnectionError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ss [20:48:24] wat [20:48:26] ok [20:48:32] Yeah. That's normal [20:48:41] o/ [20:48:44] yeah ok! [20:48:48] thanks halfak and sorry about the downtime [22:49:28] halfak: So a score_processor is a bunch of scoring_contexts and maybe an additional cache, and a scoring_context is a bunch of scoring_models together with some extractor. But an extractor inherits from the Context class which is not a scoring_context? [22:52:34] Also I noticed (in line 73 of scoring_context.py) that the scoring_context's extractor will have an extract method, but the Context class doesn't have an extract method, and the extractor class only has stub. [22:53:00] (which returns an error) [23:00:45] aetilley, that's right. I'm sorry for the naming overlap. [23:01:39] So, an extractor (like APIExtractor) has to implement the "extract()" method. [23:01:58] Gotcha [23:02:30] I looked up and not down. [23:02:33] :) [23:19:38] * halfak appreciates how quickly aetilley turned around and explained ORES' general architecture and an annoying design decision that I made. [23:20:00] :) [23:23:23] lol [23:23:54] It wasn't supposed to be critical; just wanted to make sure I understood. :)