[06:50:13] ORES scores are always a full percentage right? like "50%" and not "50.2%"? [06:51:31] no. [06:51:32] okay [12:50:31] halfak: hey, around? [13:00:01] o/ Amir1 [13:00:07] I'm just about to start a meeting. [13:00:14] I'm sorry to miss your ping yesterday. [13:00:22] np [13:00:36] I just wanted to show you something interesting [13:00:59] If you get list of Stefan, they categorized page id as an integer [13:01:16] On count of three a headdesk [13:01:18] 1 [13:01:20] 2 [13:01:21] 3 [14:49:22] o/ Amir1. Done with meeting and followup. Reading scrollback. [14:49:46] Wait. Isn't page_id an integer? [14:50:30] it's integer obviously [14:50:43] but you don't give it to a SVM [14:50:54] They have it as a feature!? [14:51:01] yup [14:51:19] * halfak wonders if it could possibly be predictive without overfitting [14:53:23] I really wonder how someone can use page_id as an integer feature (e.g. give it to SVM and that fits a model on it) [14:54:51] Let's say that vandalism was more common recently than before, there would be a page_id threshold that you could set. [14:54:54] Wait. no [14:55:01] Old page_ids will still be edited. [14:55:10] Maybe all the vandalism is on new pages. [16:17:14] discussion about bad words in Ukrainian: https://meta.wikimedia.org/wiki/Grants_talk:TPS/Ladsgroup/Wikimania/2015/Report [16:33:18] Cool! I also didn't know they had you guys write reports. [16:33:29] Fun to see all the other things you did at Wikimania :) [16:33:57] Amir1, how are the basic datasources for Wikidata vandal detection coming along? [16:34:30] :)))) [16:34:46] like gediz says, everyone wants a piece of Amir :D [16:35:04] I built first parts [16:35:05] Just so long as they don't take my piece. >:( [16:35:09] * halfak puts up dukes [16:35:24] I didn't push it to github [16:35:34] because I want an MVP first [16:35:39] If it's okay [16:36:21] Sure. No prob. I haven't been highly involved yet, so I was just checking. I figure that I want to start discussing it with you ASAP so that it's easier to merge later. [16:36:25] But I'm not in a rush. [16:37:30] awesome [16:37:45] Can we talk about it in three hours? [16:38:05] Yeah. looks like I have a slot then. Let me share my calendar. [16:38:05] I want to go biking, I took my brother's bike :) [16:38:47] I support biking. :) [16:42:37] :) [16:51:35] halfak: is it okay after your meeting? [16:52:16] Amir1, yup. I think that'll work fine. [16:52:37] I just blocked it off for us. [17:03:24] :( [17:03:26] :) [18:49:20] I am happy we are growing here [18:49:30] soon all cool people will be here :3 [18:51:17] * awight leaps out of the club window before that happens :p [18:56:35] what does models=reverted mean in the API requests? [18:59:09] legoktm: probability of an edit being reverted = anti-vandal part of the code [18:59:59] ok, but what is the concept of a "model"? [19:07:13] legoktm: the model that based on features of the revision returns a number as output [19:07:44] model is essentially is the coefficients of an equation. [19:20:52] legoktm basically the idea is the algorithm looks at what reverts look like [19:21:12] and predicts if the revision you are checking looks like it should be reverted [19:21:28] it makes no distinction between vandalism, copyright volation, or newbie mistakes [19:22:16] legoktm we trained revert model on past reverts [19:22:22] IIRC 20,000 of them [19:30:45] halfak: tell me when we want to talk [19:34:26] Hey Amir1 . getting lunch [19:34:31] Back in a few minutes [19:35:53] sure :) [19:36:10] take your time [19:36:31] OK. Skype? [19:37:14] Amir1, ^ [19:37:18] Or just IRC? [19:37:39] * halfak forgot to schedule lunch into his day :( [19:37:56] I have to check skyp [19:37:59] kk [19:38:05] I'm not sure if my connection is good enough [19:38:15] speed is 10 KB/s [19:38:33] (It should be about 256KB/s) [19:39:06] Yeah. Probably not. IRC it is. [19:41:15] So, where do I find your work? [19:49:50] https://github.com/halfak/Wiki-Class/tree/06cae5ecef7262c290a8c5bdaf669f76b1981027/wikiclass [19:49:54] we have this [19:55:45] https://phabricator.wikimedia.org/T107930 [19:55:47] Amir1, ^ [19:56:00] * YuviPanda waves from chicago airport [19:57:01] https://github.com/wiki-ai/ores-wikimedia-config [19:57:49] https://github.com/wiki-ai/ores-wikimedia-config/blob/master/feature_lists/enwiki.py#L73 [20:33:05] BTW Amir1, I just talked to YuviPanda and madhuvishy. It looks like we're going to do CORS soon, so I'll leave that ticket be. [20:34:24] halfak: remember that we're turning this on 'blanket' - in the future, if we allow write operations / logins to ORES on the same domain (unlikely...) we'll have to revisit [20:35:40] YuviPanda: we don't care what the http origin is? [20:36:04] or should we restrict it to wikipedia.org? [20:36:06] madhuvishy: nope, purely readonly service no [20:36:08] YuviPanda, quarry allows for oauth and creation of content. [20:36:09] madhuvishy: let's allow * [20:36:16] YuviPanda: okay [20:36:23] halfak: yes, and that's why it's not at the nginx level but at the app level - it's allowed only on the JSON output URLs [20:36:30] and not on login / submit / etc urls [20:36:31] Gotcha. [20:36:40] so we'll have to do the same for ORES if it ever gets those [20:37:20] Hmm... Maybe it does make sense to put it in the Flask app then. [20:37:23] halfak: which ticket, the one related to changing false positives? [20:37:28] *flagging [20:37:37] The one about RTRC [20:38:18] hmm, okay [20:38:32] halfak: it's ok to put it in nginx now maybe and then say 'this will always be READONLY' [20:38:38] which I think it should be anyway [20:39:11] Yes. This is true [20:39:24] ORES should be readonly. Other systems that are part of revscoring won't be. [20:39:30] E.g. we should look at wikilabels next. [20:39:41] I'll be incorporating this into the flask app. [20:39:49] Any protips for saying "all Wikimedia sites"? [20:40:06] none unfortunately [20:40:14] but I think allowing it to be * is ok [20:41:40] halfak: I'm writing the code, one question. wiki-calss didn't have a separate folder for features list, it has file named enwiki.py in features folder and it is the file. What do you think? a separate folder or a file? [20:42:37] separate folder [20:42:43] This is the new style [20:42:47] halfak: when celery failed do you know which worker node didn't fail? [20:42:49] Sorry to give you a mix of them [20:42:56] 01 [20:42:57] ok :) [20:43:00] ok [20:43:04] It seems that 01 is the worker that stays online [20:43:12] It also has the highest count of processed tasks. [20:44:18] halfak: interesting [20:44:19] Aug 03 21:29:00 ores-worker-02 celery[24544]: RuntimeError: Failed to process : 'NoneType' object has no attribute 'groups' [20:44:24] did we ever deploy that fix? [20:44:27] I suspect we didn't [20:44:55] We did. not deployed yet. [20:45:01] Still, that doesn't make it crash [20:45:08] I can get the service to return that error. [20:45:13] hmm [20:45:16] that's the last error I saw [20:45:18] It's just sending it to sys.stderr too. [20:45:34] Yeah. That's why I went back and got that error cleaned up. We could try a new deploy. [20:45:37] hmm, let's deploy it so we don't lose nodes? [20:45:38] yeah [20:45:40] wanna do it? [20:46:41] Sure. [20:47:07] cool [20:47:33] * halfak watches fab do it's thing. [20:47:45] * YuviPanda is on new laptop, doesn't have all the things set up yet [20:48:11] YuviPanda: I can add it to the template file no? [20:48:48] madhuvishy: ya, the nginx template no? [20:48:49] YuviPanda, looks like revscoring needs a pip release to take advantage of it. Doing that. [20:48:52] ok [20:48:56] YuviPanda: yup, cool [20:49:07] Oh say, YuviPanda, do you have the rev_id from that log line handy? [20:49:30] halfak: trwiki:reverted:15839950:0.2.0 [20:49:48] Thanks [20:54:03] ^ :D [20:54:39] * halfak watches the jessie VM building numpy [20:54:42] hmmm [20:55:56] Looks like I'll be doing a deploy during my next meeting. [20:55:58] :S [20:56:14] I think you mean ":D" [20:57:40] lol [20:57:49] Better than doodling, I guess [20:58:15] Hmm... Fun errors on staging. [20:58:17] :) [20:58:22] * halfak debugs during meeting. [20:59:24] Looks like we broke pickling of the old models with the new language module strategies. [20:59:27] * halfak rebuilds models. [20:59:31] This'll take a while. [21:02:07] halfak: madhuvishy finished up and deployed CORS now [21:02:08] \o/ [21:56:45] Could I get someone to merge a quick and easy PR? [21:56:46] https://github.com/wiki-ai/revscoring/pull/153 [21:56:54] ToAruShiroiNeko or Amir1 ^ [21:57:24] more than happy to do it [21:57:33] Thanks dude. [22:14:04] * halfak looks out for someone else that can help him merge a fix that will get us in staging. [22:14:05] Amir1|ZzZzZ, one more if you [22:14:15] Woops. Just saw the zzz's [22:14:21] Have a good night! [22:14:28] Anyone else, scope out https://github.com/wiki-ai/revscoring/pull/154 [22:15:01] halfak: sure [22:15:17] Thanks dude. [22:15:20] halfak: merged [22:15:26] \o/ [22:15:31] so many releases today. [22:15:39] BTW, we have another version issue. [22:15:48] ores needs 'mediawiki-utilities' [22:15:57] revscoring specifies 'mediawiki-utilities=0.4.14' [22:16:06] So, we download and install 0.4.15 and get sad [22:16:27] YuviPanda, do you think I should do the same version range treatment with it? [22:16:38] I was trying to stay out of it--but the module magic is making me increasingly nervous... [22:16:44] haha fud [22:16:50] * awight-fud licks self [22:16:59] of, fudge and not fear uncertainity and doubt? [22:17:07] food? [22:17:07] halfak: why do they have different version specifications? [22:17:10] with a "u"? [22:17:19] Gary Larson food [22:17:23] halfak: if it's already specificed in revscoring why is it in ores? [22:17:23] YuviPanda, ores has always been less specific. it could get more specific. [22:17:34] https://38.media.tumblr.com/tumblr_m745ykBcGt1qgw37to1_1280.gif [22:17:36] halfak: and I think you can specify a 0.4.x value [22:17:39] YuviPanda, they have independent need for mediawiki-utilities [22:17:46] hahaha awight [22:17:57] >= 0.4.14, < 0.5.0 [22:18:16] yeah looks like [22:18:23] https://groups.google.com/forum/#!msg/python-virtualenv/DvG1InRGdR0/14b9CISO2AcJ [22:18:35] so it should be >=0.4.14 <0.4.9999 [22:19:29] Sure. If I go do that quick, will you merge? Or should I keep pushing on staging? [22:19:42] yeah [22:19:47] I can merge [22:20:07] we need a long term solution for versioning libraries tho. I haven't started thinking about it for productionizing yet [22:24:36] YuviPanda, https://github.com/wiki-ai/revscoring/pull/155 [22:24:44] working on ORES now [22:31:11] halfak not a sound from the storytellers yet [22:31:33] YuviPanda, https://github.com/wiki-ai/ores/pull/77 [22:31:36] That should do it. [22:31:45] White_Cat, yeah. I have heard nothing either. [22:31:46] halfak: merging soon [22:31:46] once my internet stops being a pita [22:32:15] Thanks. Will keep working on rebuilding models. :) [22:32:35] I'm actually regenerating features too since we're going to get more obs. for removing those errors around is_bot and stuff. [22:32:43] you know, if I had my way I would air drop a 5m dish to both yuvi and amir [22:32:55] 5m dish? [22:32:56] for fast internet [22:32:56] can't, it won't fit into carry on luggage [22:33:06] Oh! [22:33:07] I'm trying to fit all my life belongings into a bag that can fit in carryon luggage [22:33:21] * halfak --> meeting [22:33:27] halfak ISAF is linked to brussels with large dishes [22:33:31] good meeting :) [22:34:06] YuviPanda hmm... so it needs to be a foldable dish [22:34:19] I imagine you having solar panels too [22:34:39] moile wireless internet with infinite power <3 [23:24:52] So.... with our new generalized versions, I'm downloading and compiling scipy. Yaaaaay [23:24:55] >:( [23:25:13] ooooh [23:25:18] halfak: 'I am' as in? [23:25:20] your local machine? [23:25:23] or? [23:25:40] well ore-compute too. It's going to download new version because of the --update directive [23:25:59] hmm, even ores-staging? [23:26:00] that's bad [23:26:01] boo [23:26:07] Yeah. [23:26:26] :( [23:26:28] No good way to tell it "don't pay attention to scipy. shhhh. just use your local version. it's fine" [23:26:41] my brain's too dead to even attempt thinking of a solution right now :( [23:26:42] sorry [23:26:54] No worries. [23:26:54] http://pythonwheels.com/ [23:26:56] maybe [23:26:59] * halfak engages brain thinking [23:27:31] * awight listens in [23:27:40] I wonder [23:27:50] if we can just package it up into a wheel [23:27:52] and deploy the wheel [23:28:06] Yeah. That's a good point. It seems like that could work. [23:28:13] So, it would pull in local scipy? [23:28:16] but that might or might not fly when we want to move to production [23:28:27] I'm not fully sure how wheels work themselves yet [23:28:28] Looks like a better version of the Anaconda packaging thing I tried to use on Travis. [23:28:38] Some of our deps don't support wheel yet [23:28:55] E.g. redis and docopt [23:29:06] oauthlib! [23:29:14] we use oauthlib?! [23:29:15] can you mix wheel and pip packages? [23:29:17] in ores?! [23:29:30] https://github.com/spotify/dh-virtualenv is also an option [23:29:58] Oh. Yeah. I suppose we only use oauthlib in wikilabels [23:30:32] * halfak preemtively installs scipy on ore-compute [23:30:52] Oh god. It's doing numpy too. [23:31:14] heh [23:31:21] was the virtualenv created with --system-packages or whatever? [23:34:50] Oh.. no. It wasn't, but I expect that the problem would still persist because the system package would be an old version [23:44:06] halfak: I'm doing some mindnumbing work atm, which is packages for mediawiki-utilities. [23:44:09] debian package that is [23:46:36] aaargh. I just made a .deb recently. Horrifying experience. [23:50:33] awight: heh :D [23:50:46] I got my package to pass lint, and that was the exact point at which nobody was interested in helping actually push it into the experimental distro. [23:50:58] http://mentors.debian.net/package/photo-booth [23:52:05] aaargh that sucks [23:52:14] but in tihs case they all can just go into our own wikimedia repository [23:52:41] +1 [23:52:48] Still, sorry about that metadata [23:53:47] ? [23:54:12] The FUD about all those debian/ files is frustrating... [23:54:32] ah [23:54:34] yes I agree [23:55:20] Someone was telling us about using .debs for deployment, on the other hand. It sounded pretty sweet. [23:55:38] who? [23:55:46] cannot remember. [23:56:51] The nice part is that you get really good logging of a heterogenous deployment environment, and the nodes can even pull their own updates if you run a repository. [23:57:00] yeah