[00:05:07] hi [01:17:02] Amir1, need new feature list pushed to the repo [08:04:56] halfak: great. I will do it asap [12:51:23] halfak: hey, https://github.com/Ladsgroup/wb-vandalism/blob/master/wb_vandalism/features_list/wikidata.py [12:51:32] Hey Amir1 [12:51:34] saw that [12:51:40] great [12:51:56] Is the model beig trained? [12:51:59] *being [12:52:07] (damn you Konversation) [12:52:33] ha. Still iterating on the input. [12:53:09] Also, working on a ping for you in phab because qgil wants us to discuss hacksessions before Nov 6th [12:53:39] halfak: I found a wonderful alternative to standing up a wikibase for media metadata. [12:53:52] Oh? [12:54:02] Inference via Commons categories on Wikidata. [12:54:46] Inference sounds complicated. [12:54:56] Unless it's one-off. [12:55:20] Well, an image is in the category "white dogs". Category is associated with items for white and dog. Images in that category are white and dog by association. [12:56:30] Also will help when we transition to proper image tagging on Commons since half the work will already be done. It's doable really because Commons is absurdly specific with their categories. [12:57:17] Ladsgroup/wb-vandalism#23 (master - e630809 : amir): The build was broken. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87878513 [12:57:22] Amir1, it's working! [12:57:33] * halfak waits patiently [12:57:39] Awesome [12:58:40] :) [13:07:16] Damn. looks like one of the features is not pickleable. But it looks like we're getting ~80 AUC, which is probably acceptable. [13:07:21] Amir1, ^ [13:07:46] do you know which feature? [13:07:54] We'll need to see what the scores look like in comparison to the edits that get flagged, but I think we're close to good. [13:08:05] Good enough to deploy anyway. [13:08:13] wb_vandalism.features.diff. [13:08:16] There [13:08:20] 's a lambda in there [13:08:23] \o/ \o/ \o/ [13:08:29] lambdas can't get pickled. [13:08:33] I can take a pass on this. [13:08:36] I've been meaning to. [13:08:58] Okay [13:09:06] I will use a function instead of lambda [13:09:12] halfak: is it okay? [13:09:20] I have a couple hours I could spend on this right now if you don't mind looking at the PR. [13:09:37] Will you be around in a couple hours? [13:09:56] yeah [13:10:05] I will be here until we deploy [13:10:18] Cool. [13:10:26] I can't see any PRs in wb-vandalism [13:10:31] do you mean in ORES? [13:10:51] Sorry. meant the PR I'll be putting together right now. [13:11:03] I've been just self-merging recently :S [13:11:07] :))) [13:11:19] Okay [13:23:29] Amir1, I'm not seeing any tests for the features in wb_vandalism. Am I missing something or is that a todo? [13:25:44] Also comments like "AF/8". What does that mean? [13:26:00] we have tests [13:26:06] (code coverage 95%) [13:26:13] Hmm [13:26:16] they are in tests folder [13:26:17] * halfak looks harder [13:26:23] in root [13:26:33] Oh! [13:26:37] Not using nosetests [13:26:45] AF is abbreviation of Abuse filter [13:26:45] ewww.. unittest [13:26:49] Gotcha! [13:27:18] when you write tests using unittest you can test them with nosetests [13:27:22] Would you mind if I switched from unittest to something like nosetests? It's totally OK if you'd prefer to stick with unittest. [13:27:29] travis does exactly this [13:27:40] Yeah. I just think the way you do it with unittests is gross and harder to test subsets of the code. [13:27:46] BUT I'll side with your judgement [13:28:22] Can I see some examples? [13:29:38] https://github.com/wiki-ai/revscoring/blob/master/revscoring/features/tests/test_diff.py [13:29:41] Amir1, ^ [13:30:16] So, if I go into the revscoring/features/ directory and run "nosetests", it will only run the tests that are in revscoring/features/tests/ [13:30:26] This is nice in revscoring since we have some tests that take a long time to run. [13:30:42] E.g. the model building tests and anything that requires NLTK. [13:31:07] It also places the tests close to the tested code. [13:34:34] I'm reading this to understand how we can write tests with nose [13:34:53] revscoring and ores use nose. [13:35:21] (actually all of the libraries I've built since 2010 use nose) [13:35:39] Since travis and other CI tools run nosetests, it is not hard to move tests to another folders [13:35:47] Indeed :) [13:35:47] I can test and let you know soon [13:36:18] but rewriting whole tests based to use nose [13:36:31] that is inefficient for me [13:36:36] I need to learn nose [13:37:03] once I learned, I start rewriting tests for practice [13:39:51] Amir1, I wouldn't mind taking a first pass. I'm extending the tests right now to catch pickling issues. [13:44:30] Amir1, can you help me find a revision wikidata that adds a language? [13:45:01] I knew one [13:45:05] let me search history [13:45:16] * halfak boosts test coverage slightly [13:48:04] And the pickle error is fixed :) [13:49:07] halfak: https://www.wikidata.org/w/index.php?title=Q2380760&diff=264056515&oldid=197154871 [13:49:11] I just made one [13:49:14] Woot! [13:49:21] Best test cases lead to contributions [13:49:55] it works! [13:50:55] Amir1, can you make me a collaborator for https://github.com/Ladsgroup/wb-vandalism? [13:51:45] halfak: you are already there [13:52:21] Oh! My bad. [13:52:24] PR coming soon. [13:54:02] :) [14:01:15] * halfak tests the tests [14:05:10] Amir1, https://github.com/Ladsgroup/wb-vandalism/pull/1 [14:07:36] halfak: merged [14:07:45] OMG you rewrote all tetst [14:07:50] :D [14:07:50] OMG [14:07:55] * halfak types fast [14:08:18] Also my editor is pretty powerful with regexes ;) [14:11:11] woot merge [14:11:45] Now to build a Makefile and encode my commandline params for building the model. [14:20:40] Amir1, BTW, I think that we ought to use a similar strategy to building a balanced sample for idwiki. [14:20:45] idwiki seems to be a botpedia [14:24:59] * halfak builds the models. :D [15:38:38] Ladsgroup/wb-vandalism#24 (pickleable - ed1149a : halfak): The build has errored. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87895474 [15:38:38] Woops. [15:38:38] Errors [15:38:39] Looks like we got up to 82 AUC. That's as good as our old enwiki model that Helder buil ScoredRevisions around. [15:38:39] So I think that's *probably* good enough. [15:38:40] LOL, github doesn't believe that my TSV is a tab separated file because it can't find any tabs. IT'S ONLY ONE COLUMN!!!! [15:38:40] Arg! [15:38:42] :) [15:38:42] halfak: technically it's also a CSV then [15:38:42] It is both and also neither [15:38:42] harej, so just pick one and format it! lol [15:38:42] * halfak tries to figure out where to file a bug. [15:38:43] You could just arbitrarily end each line with a tab? [15:38:43] Not sure. But that's at least an anti-pattern. [15:38:43] Should be fixed at the source. [15:38:50] * halfak sees the merges [15:38:50] :) [15:38:50] Woot! Amir1, we need to either incorporate the new wikibase features into revscoring or make wb-vandalism a dependency of ores-wikimedia-config. [15:38:50] I like the latter best. [15:38:50] So, that means we need to get a nice, clean version up on pypi. [15:38:50] Do you want to do that now? [15:38:50] let's have it in ores-wikimedia-config while it's developing [15:38:50] once it's finished [15:38:50] I suggest incrementing the version to 0.1.0 since I changed the module structure for "feature_lists" a bit. [15:38:50] sure :) [15:38:51] let me release the new version [15:38:51] Then we get to troll YuviPanda by telling him that we have a new dependency [15:38:51] Or... I suppose we could have another submodule. [15:38:51] That might be better. [15:38:51] :D [15:38:51] the latter sounds fine to me [15:38:51] no [15:38:51] a dependency is better [15:38:51] OK. Cool with me. [15:38:51] since submodule gets developing release [15:38:51] (we can have a branch dedicated to releases too) [15:38:51] but that would be tricky [15:38:51] Ladsgroup/wb-vandalism#27 (models_models - fdd3a5e : halfak): The build failed. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87900778 [15:38:53] Okay, it just released [15:38:53] releasing in pypi [15:38:54] https://pypi.python.org/pypi/wb_vandalism [15:38:54] Ladsgroup/wb-vandalism#28 (models_models - 4b6b90b : halfak): The build has errored. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87900821 [15:38:55] BRB, gotta shower and get ready for work stuff. I'll be back on in about an hour working to get the wikidata model onto staging. [15:38:55] Amir1, ^ [15:38:55] o/ [15:38:55] thanks halfak :) [15:38:55] It's great [15:43:06] Ladsgroup/wb-vandalism#30 (halfak-patch-1 - 7da78f8 : Aaron Halfaker): The build failed. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87901637 [16:15:43] halfak: remember, new dependencies gotta go through security review [16:15:58] and i really hope this doesn't touch pwb [16:16:04] Nope. [16:16:06] lol [16:16:23] I've got your back there. I insisted. Amir1 came through and split all the things we needed from pwb. [16:16:49] YuviPanda, also, I don't think this needs to go to prod right away. [16:17:04] In the long run, I think we'll have an ores-wikimedia-config and an ores-wmflabs-config. [16:17:28] The wmflabs one will pull in a lot of experimental stuff [16:17:53] * halfak doesn't want to give up rapid experimentation [16:17:57] Science and all that [16:18:43] Amir1, can you merge this and send a new version to pypi? https://github.com/Ladsgroup/wb-vandalism/pull/5 [16:18:57] Regretfully the install via pip doesn't work. See referenced bug. [16:19:23] YuviPanda, what do you think about the submodule vs. installation dependency? [16:19:53] Eventually, I want to manage the models independently via wiki-ai/editquality, wiki-ai/wikiclass and wiki-ai/wb-vandalism [16:20:20] Right now, I'm copy-pasting models. [16:23:22] * halfak watches Amir1 merging [16:23:26] halfak: done [16:23:28] :) [16:23:33] new version in pip too [16:23:33] Thanks [16:23:56] \o/ [16:25:25] Bah! Another problem. [16:25:37] find_packages() doesn't work when there's a missing __init__.py [16:26:34] https://github.com/Ladsgroup/wb-vandalism/pull/7 [16:26:46] Woops. Gotta increment version [16:26:47] One sec. [16:27:15] OK Pushed. [16:27:29] Amir1, ^ [16:28:42] halfak: depends on how tightly tied it is [16:28:56] halfak: if it is super tightly tied, submodule. else, library [16:29:04] done [16:29:10] uploading into pypi [16:29:41] YuviPanda, the repos I am talking about (editquality, wikiclass, wb-vandalism) will include (1) the models and (2) utilities for building and using the models. [16:30:01] e.g. http://pythonhosted.org/wikiclass/ [16:30:05] They are full packages themselves. [16:30:15] Complete with command-line utilities and stuff. [16:30:39] Further, they define "features" that will need to be importable in order to unpickle the model files that they build. [16:31:43] These packages use revscoring as a framework to build scoring utilities. [16:36:00] * halfak needs to build a diagram [16:37:04] halfak: right but I think a more defining line might be [16:37:25] halfak: 'if something in ORES breaks during a deploy, how probably is that the fix might be in this library?' [16:37:36] YuviPanda, quite probably. [16:37:40] for revscoring that's fairly high and we've been bit by it before and hence submodule [16:37:46] while it isn't that high for say sklearn [16:37:50] +1 [16:37:51] That's right [16:38:01] halfak: yeah, so if it's 'quite probably' then submodule [16:38:17] Gotcha. I'll try to get that worked out. [16:38:21] I hope that makes sense as a guideline and is a fairly clear line in the sand [16:38:32] I think it is clear. Thank you :) [16:38:44] yw [16:39:32] Amir1, can we move the wb-vandalism repo to wiki-ai? [16:39:43] of course :) [16:39:44] If you'd like it to stay in Ladsgroup, that's cool too [16:39:56] I just want to not have to update the submodule location later :D [16:40:18] Cool! You can make the move in the settings for the repo [16:40:32] I need to setup gerrit mirrors at some point [16:40:39] I already moved BWDS to wiki-ai [16:41:01] halfak: ok, csteipp says it's mostly done and should be back today or tomorrow [16:43:09] halfak: https://github.com/wiki-ai/wb-vandalism [16:45:43] YuviPanda, great! Thanks :) [16:45:57] * halfak copies new github link [16:46:00] Thanks Amir1 [16:48:42] yw [16:48:46] Thank you :) [16:49:18] Amir1, we've been using DBNames to represent wikis. [16:49:27] But wikidata's dbname is 'wikidatawiki' [16:49:36] shall we make an exception and just call it 'wikidata'? [16:49:48] Same thing with 'commonswiki' [16:49:55] and 'mediawikiwiki' [16:50:15] mediawikiwiki is just priceless [16:50:21] lol [16:50:33] I don't think we need an exception [16:50:40] You want to keep 'wikidatawiki'? [16:50:46] it's unnecessary [16:50:48] yes [16:50:52] OK. [16:50:55] :) [16:51:27] This is good anyway. [16:51:34] Gadgets will be able to use the dbname var. [16:51:44] Maybe we can include some aliases too. [16:51:59] Feature requests for ORES [16:52:02] maybe [16:52:22] :D [16:54:38] The model thinks that this looks like vandalism: https://www.wikidata.org/wiki/?diff=10000 [16:54:43] Not that confident. [16:54:45] 70% [16:55:01] Let me try to get one with a higher percentage. [16:56:06] What about this one? https://www.wikidata.org/wiki/?diff=220328471 [16:56:51] Uh oh. I get an error for that one "TypeError: Failed to process : 'set' object is not subscriptable" [16:56:57] Let's figure that out! :) [16:57:12] * halfak loves the collection of utilities for revscoring. [16:57:20] Makes it easy to replicate this on the commandline [17:02:02] Amir1, looks like the "claims_differ" can return None when 'added()' is called [17:02:05] Here: https://github.com/wiki-ai/wb-vandalism/blob/master/wb_vandalism/datasources/diff.py#L86 [17:03:19] It shouldn't [17:03:30] Yeah... Trying to figure it out myself. [17:03:40] I think I fixed this in my repo [17:03:46] let me try to fin it [17:03:49] *find [17:04:50] yeah [17:04:55] in my repo I fixed it [17:05:01] it should be added_claims += current_item.claims[p_number] [17:05:29] I will fix it and push it ASAP [17:05:41] halfak: should I release a new version? [17:05:57] Add this revid to the test case: 220298625 [17:06:01] Make sure the test passes first [17:06:03] And then yeah. [17:06:15] See anywhere else that we might be making this mistake? [17:06:22] exactly [17:06:32] Also, maybe we need to extract features again? [17:06:39] How did this feature extraction work in the first place? [17:06:46] I apparently have feature for this revision. [17:07:23] Oh wait. no I don't. [17:07:43] Woops. Wrong rev_id [17:07:52] This one: 220328471 [17:09:00] Heh. You've got a print() in the feature_list too. [17:09:07] line 76 [17:09:09] :P [17:12:11] Here are the feature values I get for that revision: https://gist.github.com/halfak/c0f66298dca427bd5c40 [17:12:21] "number_added_claims" == 1 [17:13:21] Ladsgroup/wb-vandalism#36 (manifest - 95ea07a : halfak): The build failed. https://travis-ci.org/Ladsgroup/wb-vandalism/builds/87926297 [17:15:51] >>> list(extractor.extract(220328471, [diff.number_changed_claims])) [17:15:53] [0] [17:15:56] it's okay now [17:15:59] I'm pushing [17:16:15] I fix the features_list [17:18:22] halfak: RTRC got ORES support today apparently [17:18:25] pushed [17:18:26] halfak: I've invited Timo here [17:18:43] halfak: Uploading to pypi [17:19:11] What's the difference between 'Revscoring' and 'Ores' on Phabricator? [17:19:25] hi Krinkle [17:19:38] Hi Krinkle! ORES is one of our systems [17:19:39] I think ORES is the web API and revscoring is the underlying library [17:19:42] So it's a tag. [17:19:53] Revscoring is the project name (and also one of the systems) [17:20:05] In contrast, Wiki labels is another system we maintain. [17:20:15] I believe we have a wikilabels tag. [17:20:32] So what's the tree relation between ores, wiki labels and revscoring (for Phabricator purposes) [17:20:48] siblings, or two childs of revscoring [17:22:32] I've added the above to the descriptions of those three projects. [17:22:36] Two childs of revscoring [17:22:39] Thanks [17:23:31] So in which one do I request a task to work on adding support for this stack to another wiki? [17:23:34] revscoring? [17:23:53] revscoring is great. We'll do labeling. :) [17:26:20] Amir1|afk, it works! [17:26:29] * halfak continues testing other revisions [17:26:34] Hm.. no phab wikibugs here? [17:27:10] phab wikibugs? [17:27:57] New errors with the wikidata features for this revid: 220328478 [17:28:22] halfak: I mean, to have wikibugs (as in #wikimedia-dev and other channels) report relevant task activity here. [17:28:30] halfak: it's a bot that announces changes to phab tickets [17:28:48] Oh yeah. We already have too much spam in this channel [17:28:56] Any time i work on revscoring or ores it's aweful. [17:29:02] I kind of want to turn that off, honestly. [17:29:13] Maybe we can just quiet it down. [17:29:14] it's called /ignore :P [17:29:26] By why should everyone need to ignore it. [17:29:37] wikimedia-dev is a wasteland from all of the bothavior. [17:29:49] I kinda agree with halfak [17:29:54] It's disucssed every other year. For a lot of folks it helps increase visibility, collaboration and productivity [17:30:12] yeah it's a fairly deep split, I guess. [17:30:15] Maybe I just need to dig into the config. [17:30:25] But for others, they may want to ignore it. It depends on your workflow. [17:30:43] I wouldn't mind a message for every merge, bug report, new phab task, resolved phab task. [17:31:15] But a message for "halfak committed something to a random branch" is too much [17:33:41] * halfak runs off for a quick lunch and then I'll be back to hacking on wb-vandalism to get these feature issues cleaned up. [17:35:48] halfak: I filed https://phabricator.wikimedia.org/T116939?workflow=create [17:56:14] wiki-ai/wb-vandalism#39 (manifest - 6aa25e6 : halfak): The build failed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87928883 [18:00:06] halfak: hey, I'm back. Everything is ready? [18:06:44] wiki-ai/wb-vandalism#41 (manifest - e8df4a6 : halfak): The build failed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87929184 [18:13:45] Amir1, still working on the last bug. [18:13:50] Check the issues for wb-vandalism [18:14:06] I was mostly out to lunch, so I haven't looked at it much yet. [18:15:43] It looks like the error happens when there is no parent revision (e.g. page creation) [18:15:58] Amir1, what did you pass as parent_revision.text when processing the XML dumps? [18:17:22] halfak: haha great response on the bug :D [18:18:16] lol [18:18:25] I'm going to get a bunch of "Hi" back now [18:18:40] halfak: yeah already got one :P [18:28:25] halfak: text in parent revision [18:28:43] For page creation [18:28:53] When there's no parent [18:29:41] I'm working on a PR that sets the past_item to None when there is no past item. [18:29:49] Then I'm handling the None case in the datasources [18:30:08] I defined a default [18:30:14] What was the default? [18:30:18] let me show it to you [18:30:29] I tried to pass an empty dict to item.get() as content and that failed [18:30:38] But it works if I pass {"hi": 1} [18:30:40] ha [18:31:08] Maybe if you pass nothing? [18:31:09] defualt_item = {"type":"item","labels":{},"descriptions":[],"aliases":[],"claims":[],"sitelinks":{}} [18:31:15] Ahh. [18:31:21] json.dumps(defualt_item) [18:32:02] No badges? [18:32:07] Is that a list by default? [18:35:25] badges are part of sitelinks [18:35:43] each member of sitelink is dictionary [18:35:57] *is a dictionary [18:36:01] Gotcha. [18:38:28] wait to see travis builds are passsing [18:38:39] halfak: have you tested PR 9? [18:39:18] Amir1, yeah. Just added a test case to make sure it is all working. [18:39:34] great [18:39:46] One we resolve https://github.com/wiki-ai/wb-vandalism/issues/2 we can make sure to explicitly test all cases where parent_revision doesn't exist. [18:40:08] We'll probably want to use miniconda to get around our travis CI issues. [18:40:14] Since revscoring is a dependency. [18:40:21] I could take a look at that quick. [18:42:05] Do we use any of the language-specific stuff? [18:43:46] no [18:43:48] AFAIK [18:44:15] Cool. I'll try skipping those in travis. [18:45:58] merged :) [18:46:23] do you want a new release? [18:47:08] Maybe. let me do some more testing of random revids [18:49:36] lol missed one. [18:49:44] Looks like we're erroring later now! [18:52:46] Works now [18:52:47] :) [18:52:53] Let me push [18:54:23] Amir1, https://github.com/wiki-ai/wb-vandalism/pull/11 [18:59:22] Travis is so slow! [18:59:49] https://shogofawafa.files.wordpress.com/2015/08/tumblr_npnpkttr9b1tqtfrjo1_500.gif?w=700&h=438 is my favorite gif now [18:59:54] wiki-ai/wb-vandalism#45 (parent_revision - db943cf : halfak): The build failed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87957297 [19:00:16] yeah [19:01:11] Let's not wait :D [19:01:37] I've got that other PR that is intended to fix travis and I'm waiting for it to even make an attempt! [19:01:54] c'mon travis, ya lazy bum! [19:02:25] I suppose this is prime development time while the both coasts of the US and Western Europe are awake and working. [19:06:58] OK. This is looking pretty good. [19:07:11] I'm going to try to send it to staging. [19:08:21] Amir1, did you push a new version to pypi yet? If not, could you? [19:08:56] * halfak is having a lot of fun getting WikiData vandalism detection online :) [19:09:58] halfak: sure [19:11:10] I had a presentation a month ago about Wikipedia in Tehran SFD. I talked about you. This is video of that presentation. In the mean time try to find that part :D [19:11:27] * halfak waits for link [19:11:30] halfak: https://www.youtube.com/watch?v=3BNNu_hh74k [19:12:21] Audio is weird. I can't understand what you are saying ;) [19:13:39] Cool that I can mostly make it out by your slides :) [19:14:27] :D [19:18:00] Amir1, I like your presentation style. Few lists, lots of props (slides are photos/screenshots/examples) and (what sounds like) a conversational style. [19:18:58] thanks :) [19:19:27] I particiated at a lot of good presentations and tried to learn specially in Wikimania [19:19:33] *participated [19:19:34] Maybe we should have you give the next revscoring presentation at :) [19:19:59] :D [19:20:40] This is super high quality video/audio too. You think we could get this kind of thing at Wikimania next year? [19:20:43] heh [19:21:09] Tehran SFD was held in best university in Iran, Sharif [19:21:32] in the main auditorium [19:21:57] It was a big event, 400 people attended [19:22:08] wiki-ai/wb-vandalism#47 (parent_revision - b3f1b01 : halfak): The build failed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87957842 [19:22:16] shuddup travis [19:23:12] * halfak waits for 0.1.3 version of wb_vandalism [19:23:13] https://pypi.python.org/pypi/wb_vandalism/0.1.2 [19:23:15] Amir1, ^ [19:23:23] Oh wait! [19:23:24] It's there [19:23:33] * halfak curses at google [19:23:42] 0.1.4 [19:24:00] you should get 0.1.4 [19:24:36] I see it now [19:25:55] wiki-ai/wb-vandalism#50 (travis_faster - 9614b73 : halfak): The build has errored. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87959349 [19:27:37] * Amir1 wants to crash travis-ci with hammer [19:27:57] Nice to have a ci service. We just have to do testing with it on the weekends or something. [19:29:09] Amir1, when we get this deployed, we're going to have to back fill a ton of phab cards :S [19:29:23] just kidding [19:30:17] I'm dying I need to take a sick day :D [19:30:27] :( [19:30:56] Actually burning out? If so, take a nap. I've got this for the next hour or so. [19:31:11] I can ping you when we've got it deployed. [19:31:24] Just kidding. Trying to avoid logging [19:31:36] but It's inevitable [19:32:18] no, I stay awake until it's there :) [19:32:21] OK. [19:32:32] Looks like we forgot to list pywikibase as a dependency ;) [19:33:04] Wait... that's not right. [19:33:06] It's right there [19:33:17] * halfak checks the fab log [19:35:06] I go grab some coffee, be back soon [19:35:19] OK [19:37:34] NO! I know what this is. Goddamn generators vs. lists for requirements! [19:40:46] halfak: merged [19:40:56] A new release on pipy? [19:41:00] pypi [19:41:08] Yes. Regretfully. [19:41:23] This issues has bit me in the past before or I would have been searching for hours. [19:41:27] (just a fyi, dealing with more NFS things, I won't be around for a bit) [19:41:50] YuviPanda, thanks for letting us know. Rock on. If this gets up on staging today, we'll be in good shape. [19:42:09] We've technically "deployed it" --- to staging :D [19:42:16] Also, people can experiment [19:47:12] okay, check this out in the mean time [19:47:13] https://t.co/WrOF9k325G [19:51:16] * halfak tries staging again [19:51:50] 0.1.5 is there [19:55:18] halfak: ^ [19:55:19] :) [19:56:13] Indeed. I've got it. [19:56:23] Saw it come through and started the staging process right away :) [19:56:41] \o/ [19:58:43] wiki-ai/wb-vandalism#52 (parent_revision - d364485 : halfak): The build failed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87960439 [20:02:09] Aha. I think I broke the staging process last time. Fab is sometimes weird. [20:03:51] Yar! [20:03:55] that got it. [20:04:16] * halfak watches leveshtein build [20:07:03] Amir: http://ores-staging.wmflabs.org/scores/wikidatawiki/reverted/220328478/ [20:07:15] see also http://ores-staging.wmflabs.org/scores/wikidatawiki/reverted/ [20:07:41] I'm considering trying a deploy without YuviPanda [20:08:14] wiki-ai/wb-vandalism#57 (requirements_list - 2d21b77 : halfak): The build has errored. https://travis-ci.org/wiki-ai/wb-vandalism/builds/87969216 [20:08:32] Thank you [20:08:34] :) [20:08:36] Thank you [20:08:38] thank you [20:08:42] \o/ [20:11:51] Holy molly wikidata is active!!! [20:12:05] Our precacher is going twice as fast [20:12:12] Maybe more [20:12:18] It's like having two additional enwikis [20:12:40] It doesn't look like staging can keep up. [20:12:56] Well... it's catching up again. [20:13:07] Man. We're 2-5 seconds beging live [20:13:13] *behind [20:13:24] The main cluster shouldn't have a problem. [20:13:30] :) [20:13:36] Does this signal the end of precache testing on staging? [20:13:53] Either way, we can now use the precacher as a stress tester :D [20:14:00] :)))) [20:16:31] http://ores-staging.wmflabs.org/scores/wikidatawiki/reverted/247205553/ [20:17:02] \o/ [20:17:09] TAKE THAT VANDALISM [20:17:15] WE CAN SEE YOU NOW! [20:17:58] We need to set up CORS for wikidata.org [20:18:06] And then ScoredRevisions will just work [20:18:19] I think that might be in puppet somewhere. [20:18:27] Let me bother madhuvishy :D [20:19:34] Amir1, how does bot edit throttling work on Wikidata? [20:19:46] Should we be concerned about sudden massive spikes in edit activity? [20:19:51] max is 60 edits per min [20:19:56] per bot? [20:20:00] yes [20:20:12] but we wouldn't have that big very soon [20:20:19] Oh? [20:20:26] Bots becoming less popular? [20:21:10] no, tasks are reducing [20:21:21] there is not much bot operators can do [20:21:32] Gotcha. Cool. [20:25:30] * halfak digs around for a way to get CORS extended today [20:26:45] (am still doing NFS stuff though) [20:27:05] add_header 'Access-Control-Allow-Origin' '*'; [20:27:10] CORS enabled for all domains theoretically [20:27:15] doesn't need explicit wikidata addition [20:27:18] theoretically [20:27:34] Interesting. Let me test more. Thanks for the additional info YuviPanda [20:28:31] Ooooh. Bot edits are excluded from recentchanges. That's going to make ScoredRevisions work better :) [20:29:02] Oh! I know what'd going on. [20:29:10] I need to tell ScoredRevisions to use staging. [20:29:27] It's giving me a CORS error in javascript (which is weird) but it's really 404ing on scores/wikidatawiki/ [20:31:31] ah [20:31:40] maybe it's not dealing with the 404 code properly and thinks it is CORS instead? [20:32:00] IT WORKS! [20:32:11] yay [20:32:17] glad I could help, etc [20:32:30] <3 YuviPanda. Godspeed with technical nonsense breaking down [20:32:40] Shoveling the pile of hacks? [20:32:46] yeah we're out of the woods just doing cleanup [20:32:47] * halfak searches for a good metaphor [20:32:54] yeah we moved our pile from one dump to another [20:32:57] ha [20:33:01] and realized that we had forgotten to build one of the walls [20:33:09] so shit started leaking out in one direction [20:33:21] we've built the wall now but in the process accidentally demolished 2 walls in our backup location [20:33:24] so rebuilding that wall now [20:33:57] except that our backup location isn't really a backup location and is a totally different thing even though it has the same name and a garbage dump is probably the wrong shape for it [20:34:46] * halfak is thoroughly amused by these metaphors. [20:35:19] it's like 'oh we need this thing with walls! since we were dealing with garbage dumps earlier let us use the same plan to build this slightly unrelated thing. Besides, this site was planned to be a garbage dump' [20:38:42] I'm guessing NFS is the garbage dump? [20:39:07] I'm not entierly sure [20:39:19] but I think the hosts are the dumps and NFS itself is just the garbage? [20:39:27] 90% log files gone out of control, 9% files that people forgot about and 1% super-mission-critical-things that don't exist elsewhere. [20:39:28] Yeah [20:40:06] * halfak just reverted his first WikiData damage with ORES & ScoredRevisions [20:40:08] \o/ [20:41:50] halfak: congrats [20:42:21] Most of the credit goes to Amir for building the feature sets and extracting features for a representative training set. [20:45:19] OK. I'm going to declare victory on the stress testing. [20:45:27] I think I might be trying a deploy. [20:45:40] \o/ [20:45:42] Awesome [20:46:09] I want Lydia to be able to run ScoredRevisions on WikiData when she wakes up tomorrow morning. [20:46:27] "Happy WikiData day. Here's the thing you asked us to do." [20:50:39] http://ores-staging.wmflabs.org/scores/wikidatawiki/reverted/247292529/ [20:50:46] amazing [20:52:47] Here we go. [20:53:30] \o/ [20:54:05] * halfak installs updates software on the web and worker nodes. [20:57:10] Oooh. Fun story... The long restart for uwsgi -- it's not actually restarting. The new code is running and it's still hanging. [20:57:21] So it's not a doing a soft restart [20:57:30] halfak: one thing: clue bot that I run in Persian Wikipedia based on ORES has made more than 1602 reverts [20:57:39] Wow. [20:57:46] What's the false positive rate? [20:57:46] with really high accuracy [20:57:54] :D! [20:57:58] I need to check them [20:58:02] Using the damaging model now? [20:58:05] I can't say for sure [20:58:18] but about 70-90% [20:58:51] we are also triaging edits and send anything more than 80% for human review [20:59:00] In telegram [20:59:06] If you want I can show it to you [20:59:23] "telegram"? [20:59:57] nice to see telegram being used [21:00:03] it's a popular messaging app [21:00:06] similar to whatsapp [21:00:18] americans still somehow seem to be stuck with 'texting' [21:00:31] :P [21:00:40] Amir1: is this using the new telegram bot framework stuff? [21:00:58] yeah [21:01:06] with help of ebraminio [21:01:11] he is a god in node.js [21:02:03] also that telegram bot reports every revert that the anti-vandal bot makes [21:02:14] to me [21:02:38] nice [21:02:45] I should check out their API too [21:02:46] And we're deployed. [21:02:53] \o/ [21:03:12] YES [21:03:13] YES [21:04:02] halfak: another thing: I used Kian to populate a database of suggestion (for adding data to wikidata) and now it's a game [21:04:06] being used in wikidata [21:04:16] I have about 2.6M suggestions [21:04:24] in 17 languages [21:04:31] halfak: congrats halfak! [21:04:34] sorry I Couldn't help [21:04:46] https://tools.wmflabs.org/wikidata-game/distributed/ [21:05:37] YuviPanda, no worries. Went real smooth :) [21:05:45] No piles of garbage round this side of labs. [21:06:04] we released wb-vandalism five times in a day (or six, I lost count) [21:06:05] Amir1, yeah. I saw that. I worked through a couple of 'em ;) [21:06:06] :D [21:06:13] amazing [21:06:32] halfak: yay NFS free existence [21:06:43] halfak: there was like a 30min outage in the morning (planned) that didn't affect you at all :) [21:06:43] Amir1, not that uncommon when we're working out setup.py. If you look at the history of my releases on pypi, there's a lot of PATCH-level version changes. [21:06:49] \o/ [21:07:22] :) [21:08:44] I always install directly before I upload to pypi and then uninstall/reinstall from pypi as part of my flow just to make sure I don't troll other people. [21:11:30] I've got worst headache in ages [21:11:44] I'm working on other birthday gifts too [21:12:18] halfak: Is there a way to let people use results of ORES easily? like a gadget or something like that [21:12:48] ScoredRevisions? [21:12:58] https://github.com/he7d3r/mw-gadget-ScoredRevisions [21:14:38] I already installed it [21:14:52] where I can see the results? [21:14:59] It should just work on Special:RecentChanges. [21:15:04] I didn't see anything in recent changes [21:15:10] It might take a little while to load. I only just started the precacher [21:15:19] But it shouldn't take that long at all. [21:15:29] let me check again [21:15:32] ok, NFS stuff sorted out [21:15:37] halfak: I guess I'm not needed atm? [21:15:43] (will take a break otherwise) [21:16:09] YuviPanda, we're all set. Enjoy your well deserved break. :) [21:16:15] thanks [21:16:24] PastPanda made sure you didn't have to help us that much :D [21:16:34] hahah, yessss :D [21:18:14] thank you YuviPanda :) [21:18:24] halfak: I still don't see anything [21:18:31] https://www.wikidata.org/w/index.php?title=Special:RecentChanges&hideliu=1 [21:18:35] Even in this [21:23:08] Hmmm working for me. Are you in chrome? [21:24:56] I'm seeing about 2/50 revisions get highlighted [21:25:00] Amir1, ^ [21:25:24] no firefox [21:26:30] ! It's CORS [21:26:37] Chrome is OK with it and Firefox is not. [21:27:32] Wait... now it worked. [21:27:38] Yeah. I just refreshed and it was happy. [21:28:15] In the developer tools, check the network tab for a request to ores.wmflabs.org [21:29:07] Amir1, ^ [21:29:26] okay :) [21:29:57] You can see scores in Special:Contributions too [21:30:02] I see two highlighed edits here: https://www.wikidata.org/wiki/Special:Contributions/Jey [21:32:06] It doesn't send anything to ores [21:32:15] let me check again [21:32:57] Must not be loading the Javascript somehow. [21:33:00] Where did you install it? [21:33:09] https://www.wikidata.org/wiki/User:Ladsgroup/common.js [21:33:38] :P That's not the right line [21:33:48] You were borrowing my hack to get it to talk to staging [21:34:11] I changed it ores [21:34:13] https://meta.wikimedia.org/wiki/User:EpochFail/global.js [21:34:29] lol looks like I load the gadget twice [21:34:37] you want the mw.loader.load line [21:34:43] No need for the mw.config.set line [21:35:15] I see now [21:35:17] sorry [21:35:20] No worries :) [21:38:40] YES [21:38:43] Amazing [21:44:32] Woot! [21:44:44] * halfak drafts an email to Lydia [21:44:49] Will CC you [21:53:25] OK. Email sent. Now to back fill the phab tasks [21:58:52] Thanks [21:58:56] What can I do? [22:03:00] GO to sleep :) [22:03:04] Nurse your headache [22:03:10] Celebrate great victory [22:03:16] Thanks :) [22:03:24] I'm waiting for the next gift [22:03:35] https://gerrit.wikimedia.org/r/#/c/227454 [22:03:54] if John or anyone merge them [22:06:00] Holy patchset count! [22:10:08] There is a thing with gerrit, someone gives you suggestions and suggestions until you fulfill all of his/her dreams and then someone else come and -1 the patch [22:10:17] sometimes it's like gang bang [22:10:57] I have a patch with around 70 patchsets [22:12:19] That's a scary sort of situation [22:12:48] Monoliths aren't only hard to manage because of their code. [22:13:00] There's also a lot of people with opinions and the -1 hammer. [22:18:34] Can I publish your email to Lydia? In tomorrow announcement [22:18:38] +1 [22:18:48] I [22:18:58] I'll leave the publicity to you if that's OK :) [22:19:11] I think you should be the face of this work. :) [22:19:37] I will definitely crdit your amazing work [22:19:45] without you it was impossible [22:20:34] Sweet sweet open collaboration :) [22:21:05] what I said about pywikibot also goes for mediawiki core too [22:21:22] even simple patches get lots of patch set count [22:25:24] There's a rule that they implemented for ZeroMQ to prevent this sort of delay. [22:29:46] * halfak fails to find the rule [22:30:21] Something to the tune of, "If it adds desired functionality, you must merge. You are allowed to do cleanup and restructuring during the merge, but you must merge." [22:30:33] So, no style or form complaints. [22:31:02] It seems to be working OK for them [22:38:33] that would be helpful