[00:02:56] halfak: ok I seem to have broken half of DNS resolution in labs (it is redundant so still works!) so I gotta fix that first [00:03:08] No worries. [00:03:12] Makes sense. [00:16:43] Woah... looks like there's some massive hanging going on. [00:16:46] Redis [00:16:58] Sometimes when we try to access redis, we wait 20 seconds. [00:17:03] O_O [00:17:40] interesting! [00:17:50] try doing 'MONITOR' in redis in a session to see what's causing it? [00:17:54] (still fixing DNS) [00:17:59] kk [00:18:02] * halfak monitors [00:24:34] I think we have holes in our log! [00:24:51] There's events that I would just expect to see missing from these 20s gaps. [00:24:57] Redis shows no slowdowns. [00:28:25] I just checked some likely operation types that might slow down and it looks like "SETEX" and "GET" are at a constant rate [00:35:18] halfak: hmm, do we know where the holes are? [00:55:50] I'm making a phab task that'll describe what I learned. [00:57:05] https://phabricator.wikimedia.org/T117825 [00:58:57] YuviPanda, ^ [00:59:11] I better head out. Thanks for your help today. [00:59:22] o/ [16:44:00] o/ halfak [16:46:30] Amir1, [16:46:31] so [16:46:47] I think that improving the featureset of Wikidata is a good task [16:47:44] okay [16:48:10] I think a notice in village pump would be good [16:48:33] I ask people what edits sounds incorrectly classified to you [16:48:36] +1 [16:48:58] We've been collecting them on the :m:R:Revision scoring as a service talk page. [16:49:16] We should have a better strategy for collecting them, but I haven't thought much about that yet. [16:50:55] I suppose we could set up a template and make a specific page for it. [16:51:27] e.g. {{false positive|wiki=wikidata|revid=32784233|explanation=An edit by an anon that fixes a thing}} [16:53:43] amazing [16:53:49] I will do it [16:55:02] Cool. [16:55:11] I have an idea for using badwords in descriptions too. [16:55:15] We support 13 languages. [16:55:26] We could use that to catch someone putting nonsense in the description. :) [16:55:43] hmm [16:55:54] I think we would have issue of tokenizing [16:57:58] halfak: btw Can you run the model for 100K revisions? [16:58:04] and check AUC [16:58:15] Oh! sure. [16:58:26] I'll do that right away. [16:58:45] Or maybe I won't because I can't log into ores-compute >:( [17:08:24] :| [17:08:31] do it when you can [17:08:33] please [17:08:35] thanks :) [17:11:35] Will do. [17:11:40] Trying to figure out what is up now. [18:52:09] Amir1, where was the full feature set? [18:52:20] For some reason, I only have a 20k sample around. [18:52:39] I think it's in my public_html [18:52:44] let me check [18:52:50] I will let you know about it soon [18:53:20] OK [18:55:27] Amir1, it seems the original file was called "thanks_aaron.tsv" [18:56:02] yeah [18:56:10] I found the original one [18:56:13] * halfak digs through chat logs [18:56:55] Oh! Looks like the original only had 20k lines! [18:58:43] Yeah [18:58:57] There is a huge file with 1.06M lines named res.csv [18:59:09] I took out 100K randomly [18:59:31] but I need to change it to tsv and put rev_id at first [18:59:45] halfak: what was the system revscoring accepts? [19:00:06] rev_id, feature0,feature1, ...,reverted_status [19:00:08] ? [19:00:18] That's right [19:00:21] With tabs. [19:00:33] I can do the file stuff if it's any trouble [19:00:47] It's nice that our data contains no text (except True/False) [19:57:08] o/ Amir1, any progress on the dataset? [19:57:37] hey, you will have them in one minute or so [19:58:12] OK great :) [20:02:37] strangely, it's slow [20:08:23] 13% [20:08:46] 18% [20:09:56] nfs? [20:10:04] I don't think so [20:10:10] I'm not writing it in file yet [20:10:13] 30% [20:13:00] OK. I'm going to hop on my bike and head back to my home office. I'll be back online in ~45 mins. [20:13:15] Can you drop me an email (or phab) if I'm not back in time? [20:13:29] https://phabricator.wikimedia.org/T117258 <-- My card for this [20:13:58] Nevermind. I forgot to eat my lunch. I'm here for a bit. [20:14:52] :) [20:14:58] 47% [20:21:15] 60% [20:21:52] halfak: my bot in fa.wp doesn't revert anything that sounds to be goodfatih (i.e. mode['goodfaith']['true'] > 50) [20:22:07] I want to list them somewhere and check them by hand [20:22:12] +1 [20:22:20] Would be good to know if that is working as intended. [20:22:21] I think it can help us [20:22:24] Did you see Helder's analysis? [20:22:29] no [20:22:53] https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:Projetos/AntiVandalismo/Edi%C3%A7%C3%B5es_possivelmente_prejudiciais [20:25:40] hmm [20:25:47] pt.wikipedia.org is blocked in my ISP [20:25:50] like he.wp [20:26:07] WHAT [20:26:08] Boo [20:26:29] fortunately my proxy works [20:27:40] it's sooo long [20:27:45] 71% [20:27:52] Bagh. I just downloaded the page and sent you a tar.gz [20:27:55] Oh well :) [20:28:01] I'm a super-slow proxy ;) [20:28:41] OK getting on bike now [20:28:41] :)))) [20:28:42] o/ [20:28:55] I will send an email to you [20:31:24] o/ [21:19:16] o/ Amir1. Just got back to the computer. [21:21:51] * halfak hopes Amir1 is sleeping and ignoring halfak [21:34:19] halfak: just sent the eamil [21:34:24] *email [21:34:34] I was afk for dinner [21:34:40] o/ [21:35:11] Great. I'll have results in your AM. [21:35:18] :) [21:37:10] awesome [21:43:30] halfak: What do you think if we have reporting system inside each wiki instead of meta? [22:12:08] Amir1|afk, right not, I think that we ought to have all of the reports go to one place. [22:12:19] If we could automate getting the reports from each wiki, that would be great. [22:12:41] *right now [22:13:04] In the super-short term, putting the reports where-ever makes the most immediate sense is fine so long as we can aggregate them later. [23:37:13] halfak: I don't think getting automated reports would be hard [23:37:26] I can do it very soon