[00:39:57] 90%! [00:40:03] And I found some bugs :) [01:18:18] Fun story. Just ran into a case where my local enchant dict differed from the one for travis. [01:18:36] We'll need to be cautious of making sure that we generate features on a similar install as ORES. [01:19:51] wiki-ai/revscoring#455 (features_commons - 3aa04cf : halfak): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/99363928 [01:19:55] \o/ [01:20:01] Take that travis [01:20:23] Hmm... Travis only reports 88% coverage. [11:32:23] halfak: fyi, wikilabels was down apparently, uwsgi wasn't responding. someone reported on -labs, I restarted (couldnt' investigate, is 3:30 AM) [14:27:27] Thanks YuviPanda! [14:31:49] YuviPanda: hey, can you delete the ores service group in tools? [14:32:32] Hey Amir1! [14:32:42] Working on the blog while I sip my coffee [14:32:46] hey halfak :) [14:33:01] I'm working on it [14:33:16] Adding things etc. [14:33:30] I'll try to get something together re. limitations. [14:34:19] awesome [14:35:27] I hope we finish this soon [14:35:57] +1 We want to get it out there. [14:36:45] Oh! Also, I want to re-think the balanced extractor. [14:36:58] Right now, it is primarily IO blocked on looking up information about users. [14:37:16] I think processing the XML dump is the wrong strategy. [14:37:27] We cache user data [14:37:38] Yeah. Even so. [14:37:39] so for every user we can have them [14:37:44] but [14:37:46] I agree [14:37:59] I think that what we want to do is run label_reverted on a extra large sample of non-bot edits and then subsample until balanced. [14:38:17] I think we only should use xml dumps for non-user related stuff [14:38:26] +1 [14:38:31] like --only-reverted [14:40:16] We'll get a big boost by just ignoring bot edits and bot user_ids is a short list that we can put into a set() to get a constant lookup speed. [14:41:01] I think that, in the future, we'll want to exclude bot edits from our training entirely. [14:41:12] We might want to have different classifiers specifically for bot edits. [14:43:27] I <3 this page: https://www.wikidata.org/wiki/Wikidata:ORES/Report_mistakes [14:43:42] I want to do one for the anon false-positives across the set of wikis. [15:11:00] halfak: I did that in my old script [15:11:09] we can do it in this code too [15:11:37] do you want too? [15:11:40] +1 [15:11:58] I might need to figure out how to get old models up and running so that we can show our progress for models we replace 6 months ago. [15:12:07] That should be pretty easy actually. [15:12:15] We have a history of everything! [15:19:25] halfak: re. the bug in qualifier test, Can you tell me from which revision you've got old and new Alan Turing items? [15:20:10] Good Q. Let me try to find out. [15:21:22] halfak: btw. It's Alan Turing [15:21:36] Can you change file names? [15:21:43] Did I typo? [15:21:48] yup [15:21:51] lol [15:21:52] alan touring [15:22:02] derp [15:22:08] :D [15:38:20] OK. Can't figure out what revisions they are :( [15:38:32] I have an idea though. [15:38:39] Can you edit the JSON directly in wikidata? [15:39:11] We could save the two JSON blobs as revisions on a Sandbox and then use the visual diff to inspect them. [15:39:14] Amir1, ^ [15:39:39] no it's possible unfortunately [15:39:47] we can send it through API though [15:39:57] but I'm positive we will get error [15:40:14] since there are some tools that prevents creating duplicate items [15:40:24] Arg! [15:40:39] I'm trying to figure out what's wrong [15:40:50] Can you make a Sandbox that will expect the right content format? [15:41:10] I'm writing some codes that inspect values [15:41:16] I don't think so [15:41:33] don't worry about it [15:41:48] I'm working on it [15:42:09] OK. Will leave it to you. [15:43:52] halfak: it seems number of qualifiers hasn't changed [15:44:22] OK. So I guess the code is working right, but the test case is insufficient. [15:44:39] or there is a problem upstream [15:44:44] which I highly doubt [15:44:57] (by upstream I mean pywikibase) [15:44:57] What are qualifiers usually used for? [15:45:24] The year a population was estimated? [15:45:31] it's one of it [15:45:39] e.g. for Chelsea Manning sex. We say it was male once [15:45:43] *she [15:45:51] I wonder if we could just put together a custom test. [15:46:05] How would you write a simple test to detect a qualifier change? [15:46:07] we can just remove one of qualifiers [15:46:12] I won't make any harm [15:46:24] let me paste it for you [15:46:37] great [15:48:33] halfak: https://gist.github.com/Ladsgroup/b0738ec40188cfc10241 [15:48:41] It's old version of pywikibase though [15:48:48] but AFAIK it's no different [15:49:11] but you can see the algorithm [15:49:47] So, but in this case, the qualifiers didn't change. I want to fabricate some data that *does* have a qualifier changed. [15:50:17] E.g. we could put the revision.item_doc and revision.parent.item_doc in the cache that have nothing but a difference in qualifiers. [15:50:47] we can change it two ways [15:50:48] Would you write a custom JSON doc or just build up a pywikibase.ItemPage? [15:51:03] 1- Remove a qualifier in json file [15:51:20] 2- make the item by pywikibase.Item() [15:51:33] and then find a claim that does have a qualifier [15:52:00] and claim.removeQaulifier(claim.qualifiers[0]) [15:52:10] How would you construct a pywikibase.Item() for testing? [15:52:12] I can find exact names of these methods [15:52:39] new version: pywikibase.Item(json_content) [15:52:58] old version: item = pywikibase.Item [15:53:04] item.get(json_content) [15:53:28] OK. what does json_content look like though? [15:53:43] I'm not familiar enough with the format to make a fake-document with the right properties. [15:54:13] Could we do pywikibase.Item().addClaim(pywikibase.Claim(...))? [15:54:24] That would make an item with just one claim and nothing else? [15:54:29] yes we can do that too [15:54:38] I think that'll be good for unit testing. [15:54:43] but it's wrong syntax [15:54:44] we should do [15:54:45] I'm cheating by loading in a whole document. [15:54:55] item = pywikibase.Item() [15:55:03] item.addClaim(claim) [15:55:16] (like list.sort()) [15:55:38] Gotcha. Makes sense. [15:55:50] halfak: also you can see how the json content look like by checking out the json files [15:55:53] it's messy I know [15:56:00] but use pprint [15:56:04] or json.dumps [15:56:19] yeah. I gotcha. I was trying to avoid reverse engineering that :) [15:56:34] +1 [16:00:03] halfak: oh also we are in blog again [16:00:05] http://blog.wikimedia.org/2015/12/29/blog-popular-posts/ [16:01:18] Ha! Cool. I wonder if we'll get a pageviews bump again on [[:m:ORES]] [16:01:19] 10[1] 04https://meta.wikimedia.org/wiki/:m:ORES [16:01:24] Thanks AsimovBot [16:02:32] Oooh! We can see when major news stories came out in the page view data. [16:02:40] 12/19 and 12/23 [16:03:47] * halfak loads up a new graph for ORES [16:03:57] o/ ellery [16:06:53] o/ aetilley [16:07:01] Amir1, https://upload.wikimedia.org/wikipedia/commons/b/b8/Pageviews_ORES_and_Revscoring.svg [16:07:28] We were getting 1000 views/day at one point! [16:07:42] OMG [16:10:29] win [16:11:37] halfak: I just made some edits in the blog post draft in getting involved part [16:12:00] Nice. Saw the edit to the project page too. [16:12:54] oh yeah [16:13:07] halfak: how you generate those graphs? [16:13:30] I have a hacked version of the webapp that they released with the pageviews API. [16:13:48] The javascript only understood wikipedias, so I hacked it to work for metawiki too. [16:14:11] It's all wrapped up one JS file. Do you want me to send? [16:14:18] *HTML file [16:14:21] The JS is in the HTML [16:15:09] We need something like that for ORES. [16:15:37] A simple single-page script that proves a simple UI to just score edits. [16:15:40] I want to see the original one [16:15:55] because I'm interested in getting these graphs for fa.wp [16:16:11] Here's the code: https://gist.github.com/marcelrf/49738d14116fd547fe6d#file-article-comparison-html [16:16:21] "A simple single-page script that proves a simple UI to just score edits. " I can do something for this [16:16:31] let me find some time [16:16:46] Is it okay for you? [16:16:49] Sure! [16:16:54] We've got a lot to do first though [16:17:14] Oh yeah! Was meaning to ask you to look into the filter-rate of the Wikidata model. [16:17:39] I want to work out how many edits one needs to review in order to catch 95% of edits that will need to be reverted. [16:17:50] exactly, I probably do it from Jan 15 [16:18:29] halfak: I don't know how to run things on test set [16:18:33] at all [16:18:52] you can use `revscoring score` script. [16:18:57] Or write your own script. [16:19:12] I think that what you'd need to do is 1, train a new model with a test set withheld. Find out the maximum threshold that you can set that catches 95% of reverted edits. [16:19:51] Essentially, all you need is \t pairs. [16:20:19] Once you find the threshold, then you'll want to find out what % of scores cross that threshold. [16:20:24] That's our filter rate. [16:20:51] hmm [16:21:04] I can run some of my scripts to get f1-score too [16:21:16] f1-score depends on where we set the threshold. [16:22:07] We need this for the blog post, I think. [16:22:30] Right now, we're just guess-estimating our filter rate. [16:22:57] You know what. Don't worry about this today. I'll take a first pass to see what I can learn. [16:23:08] I might just use our current model and label some new edits. [16:23:14] Since data is free :) [16:23:17] f1 score will determine the threshold [16:23:45] we can get optimum ones [16:23:47] Amir1, f1-score will allow us to optimize for balance, but I think we want to optimize at a fixed recall. [16:23:55] yeah [16:24:13] I was saying about another thing, just an experiment [16:24:35] We can use the f1 score in our hyperparameter optimization. [16:25:08] +1 [16:25:14] I use it in Kian [16:25:27] I use f-0.25 to add [16:25:44] and anything between f-1 and f-0.25 for human review [16:26:13] OK. I'll work on this in the next 4 hours and let you know how far I get. [16:26:25] I've got a meeting in 30 minutes, but then my day is wide-open. [16:26:32] okay [16:26:41] I need to study for a while [16:26:54] +1. [16:27:03] but my only thing right now is getting data [16:27:13] to get that number for the blog post [16:28:18] last thing halfak, How can I get numbers of the test set? I don't want to randomly sample [16:28:21] it's okay to sample [16:28:22] That and code review for the massive PR [16:28:44] oh about code review, I'm okay about merging it now [16:28:54] but let's run some tests for performance [16:29:03] Amir1, my plan is to sample edits from outside of our train/test set, label them using the same labeling script as reverted or not and then apply our live model (via ORES) to the new edits. [16:29:04] and ask Helder and YuviPanda to take a look [16:29:08] since it's big [16:29:19] Amir1, let's not merge yet then. [16:29:26] I'll ask Helder to take a pass over it. [16:29:43] And I'll make sure that the docs are ready to go. [16:29:58] okay [16:30:10] I'll do performance tests by rebuilding all of the edit quality models and then we should merge. [16:30:11] about the test, okay [16:30:32] what do you think of sample size? [16:30:48] I think we'll be able to do OK with 1-2k edits, but I'll do 10k to be safe. [16:30:53] I'll only sample non-bot edits. [16:31:09] okay [16:31:21] I sample both and see what happens [16:40:17] halfak: I'm sampling using this: sql wikidatawiki "select rc_cur_id from recentchanges where rc_bot = 0 and rc_type in (0, 1) order by rand() limit 10000;" [16:40:31] Looks good to me. [16:40:34] it's working now [16:40:49] Cool. BTW, if you want to drop this and get back to your studies, that's OK with me. [16:41:13] I'm about to finish this [16:41:17] But if you want to keep going, I want (rev_id, score) pairs. [16:41:35] I want to leave it work (extracting scores and reverted status) [16:41:45] kk [16:41:46] so I don't waste any time [16:41:49] :) [16:49:32] halfak: Is the new version deployed to ores? [16:49:50] the one I was working on it and improved AUC 1% [16:51:39] Amir1, good Q. You should be able to check the version [16:52:39] halfak: it's not deployed yet [16:53:16] can you deploy it? at least to ores-staging [16:53:48] I think I made the PR [17:03:52] I've got to go [17:03:54] be back soon [17:04:40] Will deploy before generating these scores. [17:11:08] huh... wikidatawiki model isn't working right. [17:11:14] It's predicting false for everything. [17:11:17] 100% [17:15:06] Hmm.... I am using a slightly more recent version of wb-vandalism. Let me revert to the version that the model was generated against. [17:15:50] Yup. Still predicting false for everything. [17:15:51] Weird. [17:16:05] * halfak prepares to re-train [17:21:46] Hmm... no change to requirements. [17:21:53] I'm re-extracting features now. [17:22:00] I should have a new model in a couple hours. [17:23:38] o/ Amir1 [17:23:56] For some reason, the model that is in wb-vandalism always predicts False @ 100% now. [17:23:57] o/ halfak [17:23:58] I wrote the code to get scores [17:24:06] I did some troubleshooting, but I couldn't figure out what was up. [17:24:20] So I am re-extracting features and training now. [17:24:38] oh boy [17:24:46] yeah. Glad I checked :) [17:25:09] features looked okay as far as I remember [17:25:37] Yeah. I don't see an obvious problem. [17:26:08] in ores-compute [17:26:23] I got reasonable scores for all of revisions [17:26:33] that's how I updated the table [17:26:43] that's strange [17:26:50] maybe I missed something in my PR [17:27:14] OK if I make your homedir on ores-compute readable by anyone? [17:27:46] I can make it [17:27:51] I don't if it is [17:28:09] Woops. Accidentally hit enter. So {{done}} [17:28:09] How efficient, halfak! [17:28:14] lol [17:28:29] :)))) [17:28:59] it's okay [17:29:54] Amir1, so, when I run the code from your directory on ores-compute, I get the same issue. [17:29:57] WEIRD! [17:30:16] wut [17:30:28] I even used your virtualenv! [17:30:35] I just got it from there [17:30:48] I don't know [17:30:55] that's super strange [17:30:55] Strange stuff. [17:31:28] We're almost ready to train a new model. :) [17:31:44] 20/24k features extracted. [17:32:19] okay [17:32:23] I go to study [17:33:13] Godspeed! [17:33:23] Oh hey! Scoring script. Is it running? [17:33:28] Amir1, ^ [17:33:44] I'll pick that up again as soon as I can get this deployed. [17:39:20] Test show 0.977 AUC and 0.922 accuracy. [17:46:40] halfak: not yet [17:46:55] that's what I got before [17:47:16] Is it deployed? [17:47:26] Not yet [17:48:12] what should we do? [17:49:15] If oyu pass me the script, I'll run it and you can study/sleep. [17:49:35] \o/ works [17:51:16] Going to staging now [17:52:36] halfak: [17:52:43] https://gist.github.com/Ladsgroup/d5e843d063597e82f70a [17:52:44] http://tools.wmflabs.org/dexbot/res_aaron.txt [17:52:58] I should put some sleep time between requests [17:53:11] Na. It's good. [17:53:25] Ooh. I'm gonna islice this. [17:53:25] :) [17:53:35] <3 itertools [17:54:26] hmm [17:54:34] I use itertools next time [17:54:46] I didn't know something like this exists [17:54:55] No worries. I'll give you a nice example :D [17:55:09] awesome [17:57:07] Back to study [17:57:15] o/ [17:58:05] * halfak hammers staging with precached [17:58:48] Aaaaand it all looks good. [18:00:27] here we go! [18:02:19] "MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error." [18:02:37] So... that's new [18:03:22] Arg! [18:03:25] YuviPanda, around? [18:14:47] Restarting cache redis [18:16:45] OK. That seems to have solved the problem temporarily. [18:20:16] Well that was fun. [18:20:24] Best I can tell, we're on a time limit. [18:20:44] For some reason we're not evicting keys correctly in redis. [18:24:28] And we're having the issue again. [18:24:43] oh boy [18:24:56] is it staging issue or the main sever? [18:26:52] main server [18:27:01] OK. So I have temp fix that I think will work. [18:27:05] "config set stop-writes-on-bgsave-error no" [18:27:18] This tells redis to not fail on writes when the rdb file can't be changed. [18:27:31] So that means we're not persisting our cache to disk anymore. [18:27:49] If the machine restarts, we'll lose all of our precached. [18:33:07] Everyone on the internet just turns off persistence when they run into this problem! ARG! [18:41:13] So, I'm starting to think this is a memory issue. [18:49:50] https://gerrit.wikimedia.org/r/261642 [18:50:00] https://phabricator.wikimedia.org/T122666 [19:02:52] o/ awight [19:05:24] Well... that issue seems to be in a stable state. I'm going to get back to working on wikidata scores. [19:53:33] halfak: You around? [19:53:39] yup [19:54:28] So I think I might need to talk to someone who is familiar with MW parsing. [19:54:38] Here's my current situation: [19:55:13] I need to extract a bunch of sentences from revisions. [19:55:25] Not markup, just sentences. [19:55:31] The tool you suggested [19:55:42] (context_tokens) [19:56:07] works to a degree, but actually seems to let a lot of markup through. [19:56:24] I'm wondering if there's anyone at the foundation who might [19:56:52] 1) Know where I can get a grunch of text of WM pages, but just the main content or [19:57:17] 2) Have suggestions on how I can use the parser-from-hell to get this on my own. [19:58:41] *sorry, that tool was [19:58:48] revision.content_tokens [19:59:26] aetilley, so, yeah, it looks like mwparserfromhell doesn't necessarily remove all markup [19:59:56] But you could do OK, I think, by supplementing it. [20:00:03] Right. Now if there's a pattern to Which markup get's through, I could work around it. [20:00:05] (regretfully, I don't know of a better source of data) [20:00:18] aetilley, can you give me some examples of where markup does get through? [20:00:26] yeah, hold on [20:01:52] so for instance [20:02:00] extractor.extract(686575075, revision.content_tokens)[:10] [20:02:13] sorry, [:40] not [:10] [20:02:29] where extractor is some APIExtractor on an english wp session. [20:02:47] Notice the "top" and "bottom" [20:02:54] What do those first 10 tokens look like? [20:03:07] (oh and also look at the first three tokens) [20:03:17] * halfak is in the middle of a major revscoring refactor and doesn't want to have to write a script to get that [20:03:18] Want me to send you a paste? [20:03:21] Yes please [20:03:31] k, one sec [20:08:15] Sent. But take your time in getting back to me. I understand you're hella busy. [20:09:19] Aha! I see. It looks like likes to file namespace are treated strangely. [20:09:23] *links [20:13:13] Sending you another one from the end of the article [20:15:25] aetilley, https://github.com/earwig/mwparserfromhell/issues/136 [20:15:36] * aetilley looks [20:15:43] I just filed a bug [20:17:28] Ok, now a noob question. If I'm looking at an article, what is the quickest way to get it's revision number? [20:17:32] its [20:17:49] Go to the history page and click on the date of the last version. [20:18:59] omg halfak [20:19:01] so sorry [20:19:10] I didn't respond to page [20:19:38] It's OK. Can't always expect you to be available. :) [20:19:51] We're online now, so no harm done. [20:20:02] Did you already read my notes and patchset? [20:20:22] I was unfortunately up till 5AM (we had 3 vaguely related outages yesterday! woo!) and then forgot to turn phone from 'alarms only' to 'priority + alarms' [20:21:31] halfak: so let's just switch the cache to a different machine [20:21:35] and make it big [20:22:26] halfak: (And then look at end of the URL?) [20:22:42] Hmm... That doesn't seem like a good solution. Shouldn't we make it so that we don't run into issues no matter what the cache size is? [20:23:08] halfak: yes, so we'll adjust maxmem with the different machine as well [20:23:19] halfak: this was my mistake in forgetting to adjust it when we switched from AOF to RDB [20:23:28] let me find that link... [20:23:52] oooh and in vm_overcommit missing [20:25:11] Yeah. Was looking at that too. Goddamn internet said "just turn off RDB" [20:25:16] halfak: I think the problem was just vm_overcommit [20:25:31] it was in our old redis puppet code and missing in our switched new redis puppet code [20:25:33] Would we get "cannot allocate" errors with that? [20:25:52] Also, just to make sure, we're still evicting keys LRU style, right? [20:26:02] It wasn't clear to me from the config. [20:27:23] halfak: yup [20:27:37] halfak: let me make sure (re: config) [20:28:16] halfak: specifically, it'll cause it to try to fork and have that fork fail [20:29:56] What exactly is forking here? [20:32:00] halfak: redis [20:32:10] halfak: so it forks, and then just reads from the data [20:32:18] and writes it as an rdb snapshot [20:32:30] halfak: this doesn't actually produce two copies of the data [20:32:40] but prevents concurrency issues when snapshotting [20:32:43] this is 'bgsave' [20:33:25] OK. So the fork is just the async job for writing the RDB? [20:33:30] yeah [20:37:58] * aetilley starts making a list [20:38:08] Also the parser ignores {math: ...} entirely [20:38:26] ^ file a bug! [20:38:37] Well it's not exactly a bug. [20:39:45] aetilley, the last email you sent looks exactly right [20:39:47] There's probably no sensible way for the parser to deal with math because a mathematical expression may be one of many syntactic categories. [20:40:05] aetilley, well, it could have a node for match [20:40:08] *math [20:41:22] hm [20:43:15] halfak: ok, so I merged a patch fixing the overcommit thing (for all redises) [20:43:34] halfak: I'm going to restart cache redis and see what happens. that ok with you? [20:43:39] They'll take down the web nodes is we restart redis. [20:43:51] They don't auto-reconnect fast enough. [20:43:59] *if [20:44:03] I see [20:44:14] we need to fix that, redis has become a major SPOF [20:44:18] But if we're ready, we can do the restart at the same time. [20:45:02] halfak: ok! [20:46:38] halfak: should we do it now? [20:47:37] Yes. [20:47:44] Oh wait. One more thing. [20:48:03] I ran that command on the cache redis to turn off write failures when persistence fails. [20:48:09] Anything you want to do about that? [20:48:13] Turn it back on? [20:48:20] halfak: restarting clears it [20:48:26] Gotcha. [20:48:29] since it was just a config command and not on the config file [20:48:40] halfak: ok, say when you're ready :) [20:49:02] Ready [20:49:19] Tell me when your restart finishes. [20:50:15] ORES starting to timeout [20:50:16] 500 [20:50:37] halfak: done\ [20:50:39] *done [20:50:50] still 500ing [20:51:05] waiting on uwsgi restarts [20:51:12] And we're back [20:51:17] ok! [20:52:07] YuviPanda, did you do the queue redis too? [20:52:12] so two things: cache failures should be soft - if we can't connect to cache redis we should just treat it like a cache miss. 2, we need a better solution for failovering our cache redis, so I'm going to look at https://github.com/twitter/twemproxy/blob/master/notes/recommendation.md [20:52:49] halfak: no, since that didn't hit limits before the vm_overcommit took effect [20:53:04] OK. [20:53:21] * halfak files bug for ORES redis cache to throw warnings instead of failing. [20:53:32] *log warnings [20:53:35] +1 [20:56:24] * aetilley just realized that {{math|...}} is what parserfromhell calls a template. [20:56:25] 10[2] 10https://meta.wikimedia.org/wiki/Template:math [20:57:07] YuviPanda, BTW, just finished a major refactor of revscoring. test coverage is up and all that. Would you be willing to do a quick review? [20:57:20] https://github.com/wiki-ai/revscoring/pull/233 [20:57:33] Regretfully, this ended up being massive. [20:57:39] RIP your computer when you click on files. [20:57:56] unfortunately probably not :( [20:58:02] still need to cleanup from yesterday's tools outage [20:58:04] OK. [20:58:10] it's like everything's on fire [20:58:13] some of th etime [20:58:43] In that case, anything I can do to help you? [20:59:14] nah, everything went to plan except me not waking up on the page. thanks a lot for disabling bgsave :D [21:00:08] * halfak flexes his log reading and googling muscles. [21:00:23] The trick to solving most server problems on linux. [21:00:26] halfak: I'm thinking we'll use https://github.com/twitter/twemproxy to make redis better in this regard. [21:00:38] spread the load over a couple of servers with consistent hashing [21:00:50] and auto_eject, so if one goes down we only lose that part of the cache [21:00:57] I thought that redis would do this out of the box. [21:01:33] ? [21:01:42] so there's 'redis cluster' that does that [21:01:47] Yeah. That [21:01:54] except it's still a bit experimental + we don't have a easy way to set that up yet [21:02:15] while twemproxy is client side consistent hashing and already used in prod [21:03:41] client-side... hmmm. How will that work with the python client? [21:05:51] halfak: the clients themselves don't know [21:06:07] halfak: it's a proxy, basically. you set it up on localhost, your python client just talks to localhost and it does the thing [21:06:45] Gotcha. So it pretends to be a redis-server [21:07:02] I guess that makes sense since it is called a "proxy" [21:07:08] yeah [21:08:32] * halfak curses at our stupid pagecounts files [21:08:42] Seriously -- some of them have corrupted gzip. [21:08:43] WTF [21:11:39] :D [21:11:41] fun [21:11:44] halfak: can you file a bug for twemproxy too? [21:12:09] This one's going on phab [21:12:27] cool [21:14:05] When the fires are out, I could use an update of what was done here too: https://phabricator.wikimedia.org/T122666 [21:14:26] Here's the tewmproxy task: https://phabricator.wikimedia.org/T122676 [21:14:44] awesome [21:16:04] Man. Downtime really cuts into other work. [21:16:14] * halfak tries to remember where he was. [21:20:24] Oh! aetilley, how's removing markup working for you? [21:23:16] halfak: hey again. Can you make a project for ores extension at phab? [21:23:39] https://phabricator.wikimedia.org/tag/mediawiki-extensions-ores/ [21:23:43] Already exists [21:24:09] awesome [21:24:11] thanks [21:25:01] and were you able to get scores and the number for precision? [21:25:19] Amir1, still detecting reverts. [21:25:29] ok [21:25:29] Even with human edits, the rate of reverted edits is pretty low [21:25:31] thanks [21:25:43] yeah, I know [21:25:56] Amir1, BTW, did you limit these edits to the main namespace? [21:26:08] I don't think so [21:26:11] I forgot [21:26:13] shit [21:26:19] Should we do it again? [21:26:34] but it only returns error [21:26:35] Na. [21:26:40] Let's leave it be. [21:26:47] These are 99% main namespace. [21:27:01] We won't be able to generate a score for the non-main namespace, so those will have nulls. [21:27:04] We can filter them out later. [21:27:13] So, we'll have slightly less than 10k edits. [21:27:14] the list I gave you is 100% main name space [21:27:22] Oh! Good! [21:27:40] Wait. Then where did you forget? [21:28:17] the res_aaron.txt is mixed [21:28:18] but [21:28:41] the scorer.py I showed to you filter out all of non-main-ns edits [21:29:00] Oh! Well... I didn't use that :P [21:29:05] because when we get scores it returns error the code skip the edit [21:29:10] * halfak cleans up his code for checking in [21:31:51] https://github.com/halfak/recall-studies [21:32:12] Amir1, see https://github.com/halfak/recall-studies/blob/master/rs/score_revisions.py [21:32:24] oh thanks [21:32:38] Looks like we're a little more than half-way done with revert detection [21:33:30] we can simply discard errors :) [21:35:36] halfak: https://phabricator.wikimedia.org/project/board/1662/ [21:35:40] What do you think? [21:39:27] halfak: Reading the parserfromhell docs. [21:39:57] Great. [21:48:01] Amir1, I'm finding bugs while I work! https://phabricator.wikimedia.org/T122679 [21:50:53] Got another one: https://phabricator.wikimedia.org/T122680 [22:00:33] I think I need to just score a shit ton of edits and look for errors more often! [22:17:20] awesome halfak [22:17:21] :) [22:43:38] Amir1, just finished labeling reverted edits. [22:43:48] out of 10k human edits, we have 18 reverted edits [22:45:47] wut [22:47:44] * YuviPanda is spending the day reading the berkeleydb manual [22:48:08] Amir1, so, we can run with this, but the stats are going to be a bit noisy. [22:48:43] what should we do :( [22:48:56] we can sample from our dump parser [22:49:50] Na. I think this is OK. [22:49:58] Let me try generating the stats. [22:50:23] Given 10,000 edits coming in, we want to estimate how many edits will need to be reviewed and this should be a good way to do that. [23:11:06] Halfway done getting scores! [23:16:01] \o/ [23:17:09] I just made this halfak. do you think it should be essential for the MVP or later? https://phabricator.wikimedia.org/T122684 [23:17:31] Amir1, good Q. Might need to be wiki-specific at least. [23:17:36] If not user-specific. [23:18:01] Let's keep it in the list and make decisions on what gets in the MVP release when we do estimates. [23:18:35] Is this damaging? https://www.wikidata.org/wiki/?diff=284477427 [23:18:39] Amir1, ^ [23:19:50] it's hard to tell [23:20:01] the user is trusted [23:20:06] These are examples of low-score edits that are reverted. [23:21:06] he did the wrong thing [23:21:16] he or she [23:21:29] the edit is bad, the user is trusted [23:21:37] Gotcha. Here's another that looks OK: https://www.wikidata.org/w/index.php?title=Q20015860&diff=286211724 [23:21:39] I don't know what to say [23:22:23] it's a client move [23:22:33] we do have a feature to catch those [23:22:40] but data is corrupted [23:22:47] Yeah. That's probably not a real "reverted edit". [23:23:12] Here's another: https://www.wikidata.org/w/index.php?diff=285223237 [23:23:51] completely okay in wikidata's side [23:23:58] client side maybe not [23:24:12] (and none of our concerns) [23:24:40] yeah. Looks like we might want to *not* use reverted edits to judge the fitness of this model. [23:26:57] we do have a huge task flow, the vandalism rate is low but still big in this scale [23:27:16] the problem is we can't get that without getting 99% of crap [23:29:06] what do you suggest [23:29:08] ? [23:29:26] do you have a number? [23:35:38] halfak: ^ [23:40:42] Amir1, sorry. Was wrapped up in reading wikimedia-l [23:41:01] it's okay [23:41:03] :) [23:41:07] Doing a poor-man's analysis on the command-line [23:41:39] So, we have 18 reverted edits out of 9869 that we could score. [23:42:18] if we set the threshold at 0.8, we get 11/18 edits [23:42:37] if we set the threshold at 0.7, we get 14/18 edits [23:43:44] We need to bring the threshold down to 0.5 to get >= 95% of the reverted edits. [23:44:17] okay [23:44:31] but based on our analysis of the 18 reverted edits [23:44:41] Yeah. Which could be sketchy. [23:45:09] But if we set the threshold at 0.5, that means we only need to review 2500/10000 edits [23:45:37] Which still brings the workload down to 25% [23:45:50] Not bad, but I bet we could do better with more data. [23:46:11] Also, I'd like to manually review these reverted edits for "vandalism" and try again. [23:46:46] halfak: did you determine a radius and windows for reverted labeling? [23:47:08] Yup [23:47:12] Same as we use for training. [23:47:19] ok [23:47:28] But of course, our alg will learn more about commonly reverted edits than unusual cases. [23:47:46] commonly reverted tends to be vandalism and unusual cases may be page moves. [23:48:38] https://etherpad.wikimedia.org/p/revscoring_wikidata_reverted_set