[00:10:39] halfak: I wasn't feeling too well this morning! Around now/later if you need anything. Sorry [04:53:17] is there a link/api endpoint from where I can get a list of damaging or reverted edits. I'm trying to look into revision scoring so I need to analyze these manually to get an idea how its predicting... [05:46:18] codezee: it's not possible yet, but you can have it soon via ORES extension [05:46:48] so it adds several tables and one of them stores scores of edits [05:47:16] Amir1: then I suppose my best bet is to query recent changes and look for reverted keywords or sift through manually? [05:48:55] codezee: it's a little bit more complicated. you can check it in en.wikipedia.beta.wmflabs.org [05:49:04] you need to make an account with dummy password [05:49:10] enable it as a beta feature [05:49:28] then you can go to your RC and/or watchlist [05:49:36] click on "Hide good edits" [05:49:53] it also highlights bad edits in your RC/watchlist [05:50:22] you can loosen the threshold in your preferences as well (RC tab / ORES section) [05:50:48] tell me if you have any questions [05:55:48] Amir1: oh I see now, thats helpful :) [05:56:15] Amir1: however most edits here look like test edits, is this also deployed on wikipedia as a beta feature? where we can see real edits [05:56:36] not in any wikipedia [05:56:51] but it'll be in several ones in two or three weeks [05:57:22] depends on when Ops can fix our biggest blocker (moving from ores.wmflabs.org to ores.wikimedia.org) [05:57:49] codezee: what language you are working on? [05:57:55] ohh I see, deployment is always a difficult one [05:58:21] fortunately we are in final stages [05:58:21] Amir1: I'm working on hindi, however I'm studying these edits for another long term task I'm aiming at [05:58:43] cool [05:59:07] deploying on Hindi might take a long time since we hasn't deploy the reverted model yet [05:59:23] in the hackathon I had discussions with aaron on extending the uses of revscoring for https://phabricator.wikimedia.org/T127470 and I intend to take this up as a final year project in my university [05:59:27] and we need damaging model (the edit quality campaign done) as well [05:59:36] awesome [05:59:51] we can move forward very fast [06:00:02] Amir1: I'm ready to help with whatever's needed as it'll allow me to know more about revscoring which I need to move forward [06:00:33] Sure, right now we need to deploy the reverted model in Hindi, it'll happen soon [06:00:46] after that we need to launch edit quality campaign in your wiki [06:01:04] in order to do that we need you to do the translation [06:01:25] codezee: I guess you have worked with translatewiki: https://translatewiki.net/wiki/Translating:Revision_scoring [06:02:30] Amir1: I've been volunteering for MediaWiki but sadly not with Translations yet [06:02:59] that's easy [06:03:14] (much much easier than mediawiki) [06:04:33] Amir1: if I want to understand the revscoring model (like how edits are fed to the system, what features are being used, and how model is trained) what would be the best place? [06:05:16] wiki-ai in github [06:05:28] github.com/wiki-ai [06:05:33] check oru repos [06:05:36] *our [06:05:55] the whole system is very complex so ask any questions [06:06:47] yes I went through it and to be honest, I didn't understand much there, I'll look more and ask [06:09:15] I really should write a documentation that how our repos are connected to each other [06:17:02] Amir1: like in any ML system, the first step is information extraction and organization, so how are edits processed here? like are they tokenized? or more? [06:23:24] revscoring does the feature extraction [06:23:35] we defined a long list of features in revscoring [06:23:57] models mostly use a proportion of them [06:24:17] edits are randomly sampled from the database [06:24:46] (the process of sampling could be found in editquality repo) [06:24:57] the huge Makefile [06:28:57] thanks! I'll go through them, and get back later [10:17:32] Amir1: if you're here, can you tell the difference between reverted and damaging model? I suppose a damaging edit would be reverted or I'm missing something? [10:18:06] the reverted model is based on our detection if edits are reverted or not [10:18:25] (so we label edits automatically) [10:18:42] but damaging is when we ask users to label around 5K edits for us [10:19:02] that if these edits are "damaging" or not [10:19:10] codezee: is that clear? [10:22:34] Amir1: if we could train the model using reverted/not reverted edits automatically, what were the missing things that prompted the need for manual labelling and damaging model? [10:23:32] the damaging model has higher accuracy, since all reverted edits are not bad [10:24:37] ohh, I understand now, thanks! [10:25:17] yw [10:37:07] Amir1: also does "not being in good faith" mean using informal words and damaging would mean using "bad words" in general? [10:37:27] no [10:37:48] good faith means if the edit is being done with good faith [10:38:00] an edit can be good faith and damaging [10:38:12] an honest mistake by a newbie for example [10:38:24] but non damaging edits are all good faith [10:39:00] bad words is just a feature [10:43:05] Amir1: you ran the edit labelling campaign of edits on enwiki before all training the model, do you have that data still which will help me understand all this distinction better? [10:44:06] it's in the database [10:44:32] I need to check and dig through files [10:44:48] you'll have it soon [10:45:26] just send me your email. (if you don't want to disclose your email address, send an email to ladsgroup at gmail) [10:46:57] done! [10:49:05] thanks [14:27:01] halfak: hey, around? [14:27:21] sabya: hey, I fixed the bug a very long time ago :) check please [14:28:03] Amir1:thanks! will do :) [15:39:07] halfak: please tell me if you're around [15:39:16] Around! [15:39:18] SOrry to miss the ping [15:40:00] np :) [15:40:15] https://phabricator.wikimedia.org/tag/wikilabels/ [15:40:23] halfak: ^ I made some changes [15:40:45] made some notes in them [15:40:59] Woah! [15:41:11] also made this patch: https://gerrit.wikimedia.org/r/282364 [15:41:36] This is great, but I'm not sure we'll want to maintain more than one backlog board. [15:41:49] Maybe we can do this one once/month or something [15:42:09] yeah [15:42:26] I just want to do this once a while [15:45:07] halfak: assign some stuff to me, I wanted to deploy reverted models in some wikis to help [15:45:19] but first I wanted to be sure that you want it too [15:45:22] afk [15:45:49] Amir1, you could look into integrating some of the language features. [15:45:58] Also sumit has a PR on revscoring that is failing [15:46:02] The one for hindi [15:46:07] I promised I'd figure out why [15:52:42] sure [15:56:59] halfak: I want to amend patches in this PR but it is in another repo that I don't have access (do I?) what should I do? [15:57:24] Amir1, if you can fix it, just do the fix along with the merge. [15:57:44] When I do that, I just use the manual merge commands and do my fix in the middle. [15:57:44] I'm not it would fix it though [15:58:16] Amir1, could try taking a pass at it and committing it to a new branch that sumit can pull from [15:58:24] That's the best alternative I can think of. [15:58:29] ok [15:58:35] I give it a try [16:22:42] Amir1, https://github.com/wiki-ai/revscoring/blob/master/ipython/hashing_vectorizer.ipynb [16:22:51] FYI [16:23:12] That's an example of bulding a prediction model with hashing vectorizer that I put together for sabya [16:23:24] awesome [16:23:45] * Amir1 is looking [16:24:24] we should use things like this in the HPI hackathon [16:25:17] wiki-ai/revscoring#630 (master - f889617 : halfak): The build was broken. https://travis-ci.org/wiki-ai/revscoring/builds/121744294 [16:27:18] halfak: have you merged the patch? [16:27:43] travis failed [16:36:10] nvm [16:36:13] I'm dumb [16:41:53] Woops. Did I do that? [16:42:06] I thought I only touched an ipython thing [16:44:25] no you didn't [16:44:33] travis reports in a very bad way [16:44:34] I don't think that this build failure is real. Looks something weird with lobgfortran [16:44:51] I think it's related to installing hindi [16:45:36] But that should be in master yet, right? [16:45:50] Gotta run. Will be back in 1.5 hours [16:52:28] wiki-ai/revscoring#631 (hindi - 03f1419 : amir): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/121751150 [16:52:44] https://github.com/wiki-ai/revscoring/pull/261/commits/29066f3a3f308d9eed7e2849a62cc3e97fe2d099 [16:56:36] wiki-ai/revscoring#633 (hindi - 29066f3 : amir): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/121752955 [17:23:01] halfak: for when you're back [17:23:02] https://github.com/ContinuumIO/anaconda-issues/issues/686 [17:23:19] I made a workaround I hope it works [18:08:05] o/ Amir1 [18:08:12] hey [18:08:15] Did you end up reverting scipy 0.16? [18:08:22] not yet [18:08:31] but I'm actually doing it right now [18:08:36] kk [18:08:38] :) [18:08:41] several other tries failed [18:08:51] BTW, when you get a chance, I forgot to ask you to look at https://github.com/wiki-ai/revscoring/pull/259 [18:09:48] sure [18:10:09] but we should keep in mind we need to add these language packages to puppet as well [18:10:35] +1 good call. [18:10:38] I'll do that quick [18:12:26] awesome [18:18:40] I really wish we had merge rights on all ORES-related modules & roles in puppet [18:18:48] YuviPanda, ^ is that crazy talk? [18:36:03] https://travis-ci.org/wiki-ai/revscoring/builds/121773696 [18:36:15] this is really strange [18:36:41] it fails because hi is not installed but It explicitly say it's there [18:36:54] lines 219 [18:38:37] Amir1, maybe there isn't an dict available for precise (13.04?) [18:39:06] https://github.com/travis-ci/apt-package-whitelist/blob/master/ubuntu-precise [18:39:09] Looks like there is [18:39:12] See line 127 [18:39:22] * halfak tests [18:39:24] It's there [18:39:28] I tested it [18:40:31] it's so crazy I want to restart [18:40:36] https://travis-ci.org/wiki-ai/revscoring/builds/121773696 [18:40:43] * halfak can't seem to connect to ubuntu repos [18:40:55] OK working now [18:41:28] oooooh [18:41:30] weird [18:41:33] I have something for you [18:42:32] Amir1, https://gist.github.com/halfak/31cbac64a44c9db73a65e61ce9e54113 [18:42:44] Installing aspell-hi doesn't make it available via enchant [18:42:48] For some reason [18:42:58] I tried that in my host [18:43:00] it worked [18:43:08] should we use myspell [18:43:09] What OS? [18:43:18] (maybe I installed it there) [18:43:27] ubuntu 14.04 [18:43:32] or 12.04 [18:43:48] no, 14.04 [18:44:11] I'm ubuntu 14.04 as well. [18:44:12] Hmmm [18:44:31] maybe I installed it another way before [18:44:39] I install lots of things in my lap top [18:44:53] (using myspell) [18:44:55] Check out the end of the gist here: https://gist.github.com/halfak/31cbac64a44c9db73a65e61ce9e54113 [18:45:09] It's *there* -- just not available [18:45:21] travis agrees with you [18:45:22] https://travis-ci.org/wiki-ai/revscoring/builds/121773696 [18:45:31] (I restarted) [18:49:00] https://www.irccloud.com/pastebin/9RCWpyux/ [18:49:10] halfak: ^, too lazy to make a gist :D [18:52:06] Amir1, wat [18:52:07] lol [18:52:16] Why does yours work and mine/travis' doesn't! [18:52:36] :))) [18:52:41] I have no idea [18:52:59] is there any dictionaries in myspell? [18:53:20] There are, but not for travis [18:53:29] Oh wait... maybe not for hiu [18:53:31] *hi [18:53:38] Even so, the aspell one should work! [18:54:44] * halfak looks for the failure in pyenchant [18:54:55] BTW, what version of pyenchant? [18:54:57] I have 1.6.6 [18:55:03] myspell doesn't have hi [18:55:06] halfak: trying to add ORES to SuggestBot, but can’t do wp10 predictions in bulk in v2? Is that a known bug? [18:55:07] let me check [18:55:21] Nettrom, should be able to. [18:55:25] or maybe I’m doing something wrong [18:55:57] Eeek! [18:56:09] Oh wait no... seems to be working: https://ores.wmflabs.org/v2/scores/enwiki/wp10/?revids=3456789|5623874|2352365 [18:56:39] ah, the URL I get from Swagger seems to urlencode “|”, maybe that’s the problem [18:57:01] hm, works when I paste it in the URL box, though [18:57:06] cool, one less problem :) [18:57:22] Hmm... even URL encoding should be OK. [18:57:32] Can you file a bug about swagger not working? [18:57:35] yep [18:57:45] will do that right now [18:59:06] Man. pyenchant is such a mess [18:59:12] Thanks Nettrom [19:01:06] halfak: 1.6.6 [19:01:24] c'mon [19:02:49] halfak: want me to assign it to someone right away? [19:02:55] the task, that is [19:03:05] Na. [19:03:11] Will pick it up when we can [19:03:25] * halfak reads the following line in pyenchant "from ctypes import *" [19:03:35] NEVER IMPORT * IN A LIBRARY OMG [19:03:43] ouch [19:03:43] :)))) [19:03:56] means bye bye flake8 [19:05:13] Yeah. pyenchant is a minefield of different tabbing and bad practices [19:05:32] In the same function even! [19:05:47] How many spaces? 2 ... no 4 ... now 2 again AHHH! [19:06:13] seriously [19:06:23] wtf [19:06:24] the important thing about standards is that you need to add one [19:07:24] several weeks ago I found a python script in puppet totally written using tab character, ewwwwww [19:07:36] (fixed it) [19:08:25] Amir1, can you try to run this in the revscoring base directory "enchant LICENSE -d en" [19:08:30] and then "enchant LICENSE -d hi" [19:08:48] sure [19:09:17] nothing for both of them [19:09:52] https://gist.github.com/halfak/31cbac64a44c9db73a65e61ce9e54113#gistcomment-1746107 [19:09:57] I get an error for hi [19:10:10] So I think the issue is with the enchant utility [19:11:10] yeah [19:11:24] should we file a bug in upstream? [19:11:37] okay, filed the task, and awesome to find the API is working… time for lunch, and then hopefully I’ll implement bulk-predictions for SuggestBot later today [19:11:44] see y'all! [19:13:23] \o/ [19:13:34] halfak: in order to fix our travice [19:13:38] I make a patch [19:13:51] then we can drop my commit from the hindi PR [19:14:47] halfak: depends on how you define crazy, etc. [19:15:14] halfak: so anyone with merge rights on our puppet repo practically has root on the entire cluster. [19:15:29] this is a side effect of the way puppet works, unfortunately [19:16:09] Indeed. But what if you could register a secondary repo with a labs project [19:16:39] And roles/modules would be ... merged? ... when configuring instances [19:18:09] YuviPanda: it does but it's much more in people's eyes. Let's say the government captured and tortured and got my ssh credentials, once they make a patch before it goes to the palladium people would notice [19:18:28] I think Ops are around 24/7 because of the timezone distribution [19:19:10] also another option would be jenkins [19:19:34] (not letting jenkins merge patches that are not related to ores/wikilabels) [19:20:16] Either way, it does seem tricky [19:20:20] Since puppet === root [19:20:44] wiki-ai/revscoring#645 (travis_again - e12c534 : amir): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/121785784 [19:20:45] https://travis-ci.org/wiki-ai/revscoring/builds/121785784 [19:20:52] (it's not hindi) [19:21:28] I think we should do this for ores as well [19:21:43] Amir1, what was the change? [19:21:49] And what's not hindi? [19:21:54] https://github.com/wiki-ai/revscoring/pull/262 [19:21:54] halfak: nope. because of the way puppet works this isn't possible. things inside module/ores aren't restricted to just running inside ores in any form or way. you can very easily make it run code anywhere else [19:22:37] YuviPanda, yeah. I hear you. I'm just imagining a merge of puppet configs *within* the ORES project by the clients. [19:22:58] indeed. you have a bettter imagination than the puppetlabs folks :) [19:23:11] So, the custom puppet stuff would only be available to ORES stuffs. [19:23:13] lol [19:23:44] yup, ores needs love [19:23:51] doing it right now [19:23:51] YuviPanda, OK so not totally crazy -- just difficult to implement [19:24:24] Amir1, Oh! Awesome [19:25:19] halfak: the word for this is multi-tenancy. puppet has no multitenancy support [19:25:34] k8s does, so in Glorious Future... [19:26:01] pushed directly to the master [19:26:22] wiki-ai/revscoring#647 (master - c49908c : Amir Sarabadani): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/121786934 [19:26:39] * halfak whispers "Glorious Future" and looks off into the distance [19:30:27] halfak: do you want to report the hindi issue to enchant? [19:30:41] I think we'll need to -- yeah. [19:30:59] and ores is fixed now :) [19:31:05] wait for travis [19:32:03] Amir1: halfak btw, puppet swat might be a reasonable compromise for getting arbitrary puppet patches merged. https://wikitech.wikimedia.org/wiki/PuppetSWAT [19:32:23] sweet [19:33:35] I try that later [19:33:42] Hi all [19:34:07] halfak: I really need to sleep [19:34:13] not really [19:34:19] I just feel sleepy :D [19:34:40] Tomorrow I investigate wikilabels issues [19:34:41] halfak and all, why I don't see any labeling campaign enabled for w:es? [19:34:46] Amir1, go to sleep! Thanks for your work & getting ORES travis in a good state:) [19:35:00] Oscar_, did you see it yesterday? [19:35:23] akosiaris: please keep me posted [19:35:25] o/ [19:35:28] o/ [19:36:38] Oscar_, I see a campaign appear when I go to https://es.wikipedia.org/wiki/Wikipedia:Etiquetando [19:38:51] No, I didn't see anything, just the header "campa�as" (campaigns) [19:40:08] ^ [19:49:10] Oscar_, any errors appearing in your developer console after you load the page [19:49:28] (sorry partially AFK while in meeting) [19:56:16] halfak: nothing new [19:56:47] How about under "network" are any of those requests failing? [19:58:31] btw, in https://en.wikipedia.org/wiki/Wikipedia:Labels something appears [19:58:39] WTF [19:58:41] lol [19:58:42] Edit Quality -- 2014 10k sample [19:58:42] +Edit Type -- 2015 january sample [19:58:42] +Draft notability [19:58:42] +Draft notability (raw) [19:58:50] Ohhhhh.... that's very wrong [19:59:12] That why I was wondering what is happening [19:59:13] You should see: [19:59:14] Edit quality (20k random sample, 2015) [19:59:14] +Edit type training (50 revisions) [19:59:53] Aha! [19:59:54] https://meta.wikimedia.org/wiki/User:Oscar_./global.js [19:59:55] maybe something with my local .js? [19:59:56] This is wrong [20:00:20] https://meta.wikimedia.org/wiki/Wiki_labels#Installation [20:00:25] Oscar_, ^ see this [20:03:29] halfak: ok, solved o/ [20:03:42] \o/ [20:15:03] halfak: ping [20:15:13] o/ Krinkle [20:15:28] halfak: Noticed more people mentioning reverted model being >50% for nlwiki mainspace anon edits. [20:15:37] Is it possible the data set contained >50% reverted genuinely? [20:15:57] Would like to narrow down or help figure out to calm down that worry [20:16:00] All anon edits? [20:16:16] At least all anon mainspace edits [20:16:24] I don't know which sample dataset you used. [20:16:41] But is there data on what the real ratio was within that set? [20:16:44] E.g. not the prediction. [20:17:12] It'd be an interesting data point either way. E.g. does nlwiki really revert majority of mainspace anon edits? [20:18:20] 945/18950 reverted/not-reverted. [20:18:57] not sure about anons proportion there [20:19:00] Hm.. ok [20:19:23] I think we have about 1500-2500 anon edits per day on average in the main namespace [20:19:30] Shouldn't be true that *all* anon edits get flagged. [20:19:58] yeah, taht's no longer the case since you use gradient bloom thingy [20:20:04] much better now [20:20:18] :D [20:20:49] I brought up the campaign again in the nlwiki village pump to help complete it [20:21:35] I like that the label UI doesn't show the username or IP etc. in the diff view. [20:21:47] Reduces bias I imagine. [20:24:19] Krinkle, https://gist.github.com/halfak/bbcedf0817f9382f61a27a0a1eec0b66 [20:24:49] Krinkle, yeah, by hiding username, I'm hoping to reduce that bias. [20:24:57] Looks like anons dominate the reverted edits. [20:25:10] I bet this is true of most wikis [20:25:38] how about reverted/non-reverted ratio within anon edits? [20:29:52] Krinkle, working on it -- wiating for query [20:33:13] halfak: Cool. Thanks :) [20:33:37] https://gist.github.com/halfak/e4fb4643c6ba3b76a330b3861d6f9af0 [20:33:40] ^ Krinkle [20:33:42] halfak: noticed the label campaign includes null edits relating to page moves. "No difference" but the edit summary shows the rename. That's an interesting one. [20:33:59] Oh yeah. Those should be rare [20:34:03] * halfak crosses fingers [20:34:57] Ah, so about a third of anon edits were reverted in that data set. [20:35:21] 44% [20:35:27] Also, we do some re-weighting so that positives (reverted) are more likely to be weighted for. [20:35:41] This is due to the lack of positive observations. [20:35:54] halfak: What is the data set based on? [20:36:09] Is it the same data set wikilabels is serving to users for labelling? [20:36:32] Krinkle. yes. this is the same revision set [20:36:35] cool [20:36:37] Well,..sort of [20:36:45] We give nlwiki a subset [20:36:59] After filtering out sysop edits and bot edits. [20:37:15] (unless they were reverted) [20:38:00] ah, is that why he number changed recently from 4400 to 4100 something? [20:38:17] Shouldn't have changed recently... hmmm [20:38:23] Where did you see a change [20:40:11] halfak: You made the edit :) [20:40:17] https://nl.wikipedia.org/w/index.php?title=Wikipedia:Labels/Kwaliteit_van_bewerkingen&diff=prev&oldid=46463780 going forward from there [20:40:21] the total changes 2 or 3 times [20:41:02] 4000 , 4161, 4600 [20:41:04] actually, not sure. [20:41:08] someone mentioned it in the village pump [20:42:18] Don't worry about it :) [20:42:20] Must have got the number wrong at first :) [20:42:35] I blame jet lag for my lack of memory :) [20:54:43] Krinkle, I'll be picking up some analysis of anti-anon bias in the models in the near future. I'll make sure to take a look at nlwiki and will report what I learn. [21:19:47] Buenas noches [21:20:15] Estoy interesado en el proyecto de etiquetar ediciones pero tengo alguna duda sobre el proceso de inscripción [21:20:24] Hola Hans_Topo1993 [21:20:38] Dinos [21:20:57] El código que hay que pegar en global.js [21:21:06] Es simplemente pegarlo ahí? [21:21:26] Es decir, mi caso sería (User:Hans Topo1993/global.js)? [21:21:35] Pego el código y guardo la página [21:21:58] Si, pero mucho cuidado, que yo he pegado uno que no es [21:22:29] Es https://meta.wikimedia.org/wiki/User:Oscar_./global.js [21:22:51] Luego vuelves a Wikipedia:Etiquetando [21:25:41] Haces clic en conectar al servidor, te va a salir una página que te lleva a una autorización de OAuth y luego te debería cargar la campaña [21:26:43] La campaña esta en la pestaña "iniciativas"? [21:26:57] No, en inicio [21:27:33] Pues no me aparece [21:28:03] Si debería aparecer donde "antes" ponía "install the gadget", me sigue saliendo eso [21:28:42] Vale [21:28:44] Ya está [21:28:59] He cerrado y abierto el buscador y listo [21:29:03] Te debería salir Editar calidad (20k muestra aleatoria, 2015) [21:29:28] Si! [21:29:48] Está ya hecho? [21:30:28] Le das a solicitar grupo de tareas, te van aparecer diffs de ediciones anónimas, nuestro trabajo es identificar si son dañinas y si son de buena fe [21:30:35] En teoría si Hans_Topo1993 :) [21:31:13] ¿Se puede empezar ya? [21:31:20] Muchas gracias por la ayuda!! [21:32:43] Sí, por supuesto [21:32:47] No hay de que, yo apenas hace unas horas pude finalmente activar la aplicación (me dio muchos más problemas que a ti, para ser honestos) [21:37:41] Bueno, pues voy a ver si etiqueto alguna [21:37:45] Muchas gracias de nuevo [21:55:08] \o/ [21:55:14] Thanks for being here to help Oscar_ :D