[13:13:57] halfak, I left some notes on the blog post [15:10:40] o/ halfak [15:29:23] o/ [15:29:30] Hey Amir1 [15:29:39] (thanks Helder, will take a look) [15:30:02] okay we have a PR re. global user age :) [15:31:33] Woot! [15:32:49] "globalinfo" Interesting! [15:33:41] we can have list of "global" user groups (such as stewards, etc.) [15:33:48] +1 [15:34:11] and then use it as feature, like "number_global_groups" [15:34:12] Have you run some tests with this already? [15:34:16] yes [15:34:25] Cool! Merging. [15:34:41] thanks :) [15:34:53] then we can use it in wikidata.py in wb-vandalism [15:35:02] Hmm... why is travis failing? [15:35:23] test_user_info_from_doc is failing [15:36:47] https://github.com/wiki-ai/revscoring/blob/master/revscoring/extractors/tests/test_api.py#L7 [15:36:48] This [15:36:55] That test is now failing. [15:36:59] Amir1, ^ [15:37:19] I check right now [15:37:23] I tested it [15:40:02] halfak: Where it's failing [15:40:04] https://travis-ci.org/wiki-ai/revscoring/builds [15:40:10] I can't see any builds [15:40:28] https://travis-ci.org/wiki-ai/revscoring/builds/92474950 [15:40:33] wiki-ai/revscoring#323 (new_languages - e8e3c06 : halfak): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/92565043 [15:40:57] Amir1, see https://travis-ci.org/wiki-ai/revscoring/builds/92562403 [15:40:58] This one is not failing because of user.globalage [15:41:08] thanks [15:41:12] :) [15:41:39] * halfak just got the et/uk pr to pass travis too [15:41:52] Gotta go say goodbye to a house guest. Will be back in a bit. [15:45:42] ok, I just amended that commit [15:45:48] let's wait and see what happens [16:21:07] halfak: It's passing now [16:21:08] https://travis-ci.org/wiki-ai/revscoring/builds/92565786 [16:26:37] halfak: btw: I can't test with the new model on wikidata [16:26:46] I only get really old scores [16:27:16] I'm confused. What do you meay you get really old scores? [16:32:38] e.g. 0.98% for water [16:32:45] halfak: ^ [16:33:24] Why can't you test with a new model though? [16:35:05] I meant using web [16:35:11] ores.wmflabs.org [16:35:21] isn't it accessible [16:35:44] halfak: ^ [16:35:55] But... it looks like we're online to me. [16:36:23] Looks like the proxy is slow today [16:36:27] But we're up. [16:37:36] let me check again [16:40:00] Woah. Looks like we're getting overloaded a lot the past few days. [16:40:06] I wonder if someone is running an analysis. [16:41:49] Looks like something I want to have YuviPanda take a look at. [16:42:02] Looks like the load started at 10:00 UTC [16:44:24] Yeah. We're not doing great. [16:46:19] So, I only see one task processor running on worker-01 [16:46:41] ... and a restart seems to have brought us back [16:47:16] Yeah... only 3 active workers on worker-02 [16:47:17] WTF [16:48:05] So, I'm doing worker restarts now and it seems to be helping substantially. [16:48:21] The only recent change was a puppet change that YuviPanda made for redis. [16:48:28] We used to have an issue with workers going down. [16:48:34] It was related to redis. [16:48:37] So I have a hypothesis. [16:48:59] OK. We're a fast MOFO again. [16:49:07] yikes. [16:49:11] We need to set up paging. [16:57:29] Amir1, I left you some notes. I think that global_user_info should be a different datasource. [16:58:00] Not all user_info features require global user info. [16:58:15] so it would be a shame to fetch that data unnecessarily. [17:08:56] Amir1, the new model works way better on wikidata [17:09:10] It's not flagging most of the edits to Water :) [17:09:25] I changed: user.age and revert detection [17:10:10] So... it looks like the vast majority of reverts in wikidata are self reverts or are reverted back to by someone other than the original editor!!! [17:10:32] Our balanced training set is suddenly super unbalanced! [17:14:48] I think we might have something weird going on with revert detection wikidata. [17:14:55] Maybe sha1 is not good enough. [17:18:05] Yeah. This is really weird. [17:18:22] Somehow, this edit was reverted: https://www.wikidata.org/w/index.php?diff=174923473 [17:20:36] This is a *reverting* edit! [17:20:39] WTF is going on? [17:27:12] If I just run that edit through the label-reverted utility again it returns False [17:27:14] WTF [17:29:22] Wait a sec. I might have been looking at the wrong data file. [17:29:24] Checking again. [17:30:51] OK. Things are looking better now. [17:30:55] * halfak keeps checking [17:32:07] OK. yeah. We're fine. I'm dumb :) [17:35:15] :))))) [17:36:03] halfak: okay, What can I do [17:36:05] http://ores.wmflabs.org/scores/wikidatawiki/reverted/264410816/ [17:36:21] (still 98%) [17:36:56] So I need it as another feature [17:36:59] *datasource [17:37:04] it would be okay [17:37:05] :) [17:37:13] You will have them by tomorrow [17:45:34] Amir1, i haven't released any new models yet. [17:45:37] Still working on that. [17:45:46] Let me try that revid in the model I just built. [17:46:49] {"probability": {"false": 0.5593577624294263, "true": 0.44064223757057386}, "prediction": false} [17:48:56] Lots of bots reverting bots. [17:49:40] what happened to portuguese informal words? https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists [17:50:32] Helder, looks like we haven't had someone go through the generated lists. [17:50:35] That seems wrong to me too. [17:50:49] Yeah. I see them. [17:50:56] Looks like we accidentally lost them. [17:50:59] I'll copy them back. [17:51:12] yeah, I think I did that some time ago [17:52:01] Added back: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/pt [17:52:56] * halfak purges the big table. [17:53:27] Oh god. We need to update both pages. Booo! [17:53:33] o.O [17:53:40] isnt a transclusion? [17:53:49] I guess not :\ [17:54:01] OK. Looks like we're up to date again. [17:54:35] so, are these included in the models and just missing on meta? [17:54:44] Yes [17:55:20] any chance the bot will remove them by accident? [17:55:30] Looks like it was ToAruShiroiNeko this time. [17:55:48] hah [17:55:52] We needed to do some re-running to get rid of interwiki links [17:55:58] And it was lost in the confusion. [17:56:25] got it [17:56:47] hmm? [17:56:59] strugling with windows -_- [17:58:05] o/ ToAruShiroiNeko [17:58:15] I implemented language stuff for et and uk yesterday. [17:58:23] awesome [17:58:28] Can you instruct people *not* to give us regular expressions on the wiki pages? [17:58:35] I need real word lists in order to write tests. [17:58:42] sure [17:58:45] Thanks :) [17:59:11] It would be great if someone could take a word that we flag and add other examples of how it could be used. [17:59:13] I'll ask them to post the regexes on the talk page :) [17:59:28] e.g. "ass" "asshead" "assface" "asses" [17:59:32] Yeah! That's cool too. [17:59:57] regexes are great, but we need to test them against real words at some point and Cyrillic is like Farsi to me. [18:00:08] I see symbols, but they don't turn into useful things in my head [18:00:50] So it's really good if I can copy-paste something that someone else curated for the test cases and then make my mistakes in the regexes. [18:01:02] When I have the tests, I can be much more confident. [18:01:22] It's weird how French, Spanish and Portuguese is much easier for me to work with. [18:56:40] Hey ToAruShiroiNeko [18:56:50] It looks like I'll need the campaign names translated too. [18:57:05] "Edit quality (20k random sample, 2015)" [19:00:49] For right now, I'll just use a machine translation. [19:58:46] ls [19:58:50] DIR [19:58:51] :D [20:03:27] hey halfak [20:03:32] I'm here only very shortly [20:03:32] Hey YuviPanda [20:03:40] but saw message about needing paging, agree [20:03:49] We had an issue last night. The workers shut down [20:03:51] can you file a bug with the different conditions we should check for? [20:03:54] COuld it be related to new redis? [20:04:02] no, the redis itself is exactly the same [20:04:06] with exactly the same config [20:04:10] just different puppet setup [20:04:26] We haven't had workers shut themselves down in a long time. [20:05:07] I'm happy to setup SMS paging for the both of us too [20:05:54] to step back, it could be related to the redis *restart* - the config itsel hasn't changed [20:06:05] I can take a look later [20:06:11] were there any error messages in the logs? [20:06:42] I just restarted as soon as I saw the workers down. :/ [20:06:59] yeah, that's fair [20:07:16] ok I'm going to give myself 5mins to look at it right now and then GTFO :( [20:07:18] I can look later [20:07:34] Totally reasonable. Thank you! ;) [20:07:36] *:) [20:07:57] and do file the bug about monitoring! [20:08:06] Working on it right now. [20:09:28] I don't see anything obvious in the logs [20:09:39] halfak: next time if it happens, can you leave one node in failed state? :D [20:09:44] we can also increase our node count [20:10:27] YuviPanda, sure. [20:19:01] halfak: Imma run away now [20:19:04] sorry! [20:19:05] o/ [20:19:07] No worries [20:19:12] happy sunday !