[06:27:52] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202163 (10akosiaris) Hello, On the DB I see these errors repeatedly ``` 2016-04-06 09:37:37 GMT ERROR: duplicate key value violates unique constraint "lab... [09:29:39] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test `damaging` and `goodfaith` model for ruwiki - https://phabricator.wikimedia.org/T131999#2202490 (10Ladsgroup) Extracting features of the 20K edits for damaging right now [09:56:18] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202528 (10Ladsgroup) Hey, Thank you for taking a look at this >>! In T130872#2202163, @akosiaris wrote: > Hello, > > On the DB I see these errors repeatedly... [10:30:28] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202645 (10Ladsgroup) https://github.com/wiki-ai/wikilabels/pull/108 [10:39:03] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test `damaging` and `goodfaith` model for ruwiki - https://phabricator.wikimedia.org/T131999#2202660 (10Ladsgroup) Damaging: ``` ScikitLearnClassifier - type: GradientBoosting - params: center=true, verbose=0, min_samples_split=2, random_state=null... [11:07:40] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202690 (10akosiaris) >>! In T130872#2202528, @Ladsgroup wrote: > Hey, > Thank you for taking a look at this >>>! In T130872#2202163, @akosiaris wrote: >> Hel... [11:28:07] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202724 (10Ladsgroup) I think there are several upserts in wikilabels right now, I take a look at the articles, and try to implement a robust/efficient method... [11:51:10] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test `damaging` and `goodfaith` model for ruwiki - https://phabricator.wikimedia.org/T131999#2202764 (10Ladsgroup) Goodfaith: ``` ScikitLearnClassifier - type: GradientBoosting - params: scale=true, warm_start=false, loss="deviance", center=true, n... [11:52:03] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202771 (10akosiaris) >>! In T130872#2202724, @Ladsgroup wrote: > I think there are several upserts in wikilabels right now, I take a look at the articles, an... [12:05:19] 06Revision-Scoring-As-A-Service, 10rsaas-editquality: Train/test `damaging` and `goodfaith` model for ruwiki - https://phabricator.wikimedia.org/T131999#2202785 (10Ladsgroup) https://github.com/wiki-ai/editquality/pull/26 [12:18:35] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2202865 (10Ladsgroup) Based on request URL I can say it was an upsert: The insert query: ``` INSERT INTO label VALUES (300456, %(u... [13:11:06] o/ [13:19:51] halfak: o/ [13:20:10] hey, I did some stuff you need to check :) [13:20:39] 1- I finished ru damaging and goodfaith [13:21:15] I must note while I was extracting features for damaging my revscoring wasn't updated but I updated it once I realized you made a breaking change [13:21:33] and models worked perfectly fine [13:21:41] do I need to re-extract features? [13:22:15] Amir1, probably OK with feature extraction. [13:22:27] 2- lots of discussions in T130872 [13:22:31] awesome [13:22:35] I made the PR [13:23:20] Hey :) [13:26:18] o/ putnik [13:27:31] halfak: 3- I'm adding qqq for wikilabels too, one message left "Body Copy", what do you mean by that? [13:28:22] the main article or text that writers are responsible for, is contrasted with "display copy", accompanying material such as headlines and captions, which are usually written by copy editors or sub-editors. [13:28:30] https://en.wikipedia.org/wiki/Copy_(written) [13:28:46] Wow [13:28:49] awesome [13:29:15] Done [13:29:17] thanks :) [13:30:28] 06Revision-Scoring-As-A-Service, 07I18n: Complete the message documentation (qqq) for Revision Scoring - https://phabricator.wikimedia.org/T132208#2191475 (10Ladsgroup) @halfak made most of documentation, I finished it off :) [13:31:04] halfak: I need to go right now, but I'll be back very soon :) [13:31:45] OK. o/ [13:31:55] Will have notes on PR [13:32:03] awesome [13:32:05] thanks [14:09:44] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Fix WP 1.0 label extraction process for English Wikipedia - https://phabricator.wikimedia.org/T130312#2203192 (10Halfak) https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_article_quality/Work_log/2016-04-08 Looks like we get Acc... [14:10:21] 06Revision-Scoring-As-A-Service, 07I18n: Complete the message documentation (qqq) for Revision Scoring - https://phabricator.wikimedia.org/T132208#2203193 (10Halfak) \o/ [14:32:11] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2203274 (10akosiaris) Based on the discussion above, and the rate of error logging during the period of bad database performance, I am starting to think that... [14:34:05] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2203279 (10Halfak) Makes sense. Thanks. [14:39:03] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: Implement abstraction for Sparse Feature Vectors - https://phabricator.wikimedia.org/T132580#2203308 (10Halfak) [14:42:42] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: [Spike] Proof of concept damage detection with hash vectors - https://phabricator.wikimedia.org/T132581#2203333 (10Halfak) [14:51:26] Amir1, met me know when you get to update https://github.com/wiki-ai/wikilabels/pull/108 based on akosiaris' thoughts. [14:51:41] It all makes sense to me now and I'll be happy to merge. [14:51:53] yeah [14:52:01] halfak: I'm trying to add it right now [14:52:06] first thing on my to-do list [14:53:03] https://imgur.com/XZai0D9 [14:53:31] ^ me when I see akosiaris talk about the transaction getting rolled-back. [14:53:47] loool [14:54:55] :) [14:55:08] :)))) [14:59:39] halfak: do you have some time to check the ru PR? [15:07:12] halfak: https://github.com/wiki-ai/wikilabels/commit/4d9484629972e5327cda22427524af7bb03b764a [15:07:33] It's impossible to test how it might affect DB [15:07:40] in my pc [15:07:51] let me try using API [15:09:17] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Fix WP 1.0 label extraction process for English Wikipedia - https://phabricator.wikimedia.org/T130312#2203502 (10Nettrom) Hmmm… these are interesting results. I'm wondering if my approach results in a less noisy dataset, but also not sure how we would g... [15:12:15] halfak: can I add it to the staging server? [15:12:22] Yeah. [15:12:39] We can deploy your branch to the staging server [15:12:57] great [15:13:03] let me do that real quick [15:13:05] Amir1, actually, what do you think of setting up labels-experiments.wmflabs.org? [15:13:15] ++++++1 [15:13:18] :) [15:13:22] +2 (maybe?) [15:14:28] I want to also check how fast it works with index added [15:14:39] halfak: Can I do it? [15:14:52] *by "do" I mean setting up the server [15:14:57] Yes please. Also you should have the perms [15:15:03] yup [15:16:20] 06Revision-Scoring-As-A-Service, 10wikilabels: Setup labels-experiment.wmflabs.org - https://phabricator.wikimedia.org/T132588#2203511 (10Ladsgroup) [15:18:37] I chose staging role [15:18:48] I don't think they differ at all [15:23:59] Amir1, +1 [15:33:11] initialize the server [15:45:58] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Load 200 & 5k samples into wikilabels - https://phabricator.wikimedia.org/T132593#2203663 (10Halfak) [15:46:41] 06Revision-Scoring-As-A-Service, 10rsaas-editquality, 10wikilabels: Load 200 & 5k samples into wikilabels - https://phabricator.wikimedia.org/T132593#2203678 (10Halfak) Sample are here: https://github.com/wiki-ai/edittypes/tree/master/datasets Notes on loading wikilabels here: https://meta.wikimedia.org/wik... [15:46:58] 06Revision-Scoring-As-A-Service, 10rsaas-edittypes, 10wikilabels: Load 200 & 5k samples into wikilabels - https://phabricator.wikimedia.org/T132593#2203679 (10Halfak) [15:55:46] halfak: I'm trying to initialize the server but it gives me error while trying to connect to the db [15:56:03] I think that's because it's in wmnet [15:56:19] do you know how I can fix it? [15:56:49] Hmm... you should be able to connect to the staging DB. [15:56:51] * halfak looks [15:57:05] Amir1, hostname? [15:57:17] wikilabels-experiment [15:57:29] in wikilabels [15:57:51] I made some modifications to the wikilabels config repo too, I need to make a PR for them [15:58:09] oh! There's a line in the fabfile for uploading staging creds. I'll do that. [15:58:34] thanks :) [16:00:03] Amir1, should be good [16:01:07] \o/ [16:01:09] thanks [16:25:33] 06Revision-Scoring-As-A-Service, 06Research-and-Data-Backlog, 07RESTBase-API: Public API endpoints for new services - https://phabricator.wikimedia.org/T103811#2203844 (10GWicke) There are some new project-global content entry points (picture / article of the day, trending articles, "in the news") that need... [16:30:51] halfak: http://labels-experiment.wmflabs.org/ [16:31:02] I need to make lots and lots of changes though [16:31:03] :D [16:31:15] 06Revision-Scoring-As-A-Service, 10wikilabels: Setup labels-experiment.wmflabs.org - https://phabricator.wikimedia.org/T132588#2203860 (10Ladsgroup) http://labels-experiment.wmflabs.org/ [16:47:55] Amir1, \o/ glad to see it working :) [16:58:07] I'm hoping to finish up the anon bias analysis and some editquality bits from the hackathon today [16:58:20] ^(hungarian & swedish prelabeled sets for wiki labels) [16:58:23] awesome [16:58:29] Running hungarian prelabeler now [16:58:47] I'm making PRs for my first wikilabel server initialization [17:03:36] Ooh I should look at norweigian too. [17:03:42] We should be able to merge that language asset. [17:14:34] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring: [Spike] investigate using compression ratios in revscoring - https://phabricator.wikimedia.org/T132375#2204025 (10Halfak) [17:14:59] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Hungarian Wikipedia - https://phabricator.wikimedia.org/T131446#2204029 (10Halfak) [17:15:22] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Hungarian Wikipedia - https://phabricator.wikimedia.org/T131446#2167392 (10Halfak) Looks like we are good with data. We just need these translations. See my notes here: https://meta.wikimedia.org/wiki/Research_talk:Automated_classifica... [17:15:54] 10Revision-Scoring-As-A-Service-Backlog, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2204031 (10Halfak) [17:16:13] 10Revision-Scoring-As-A-Service-Backlog, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2167492 (10Halfak) Looks like we are good on data. We just need translations. See my notes here: https://meta.wikimedia.org/wiki/Research_talk:Automated_classificat... [17:16:25] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2204036 (10Halfak) [17:18:48] halfak: https://github.com/wiki-ai/wikilabels/pull/108/files [17:18:59] Is there anything that I need to do? [17:19:18] Amir1, just make sure that it still works on the experimental [17:19:34] Once you are convinced that you can create new labels and overwrite old labels, I'll merge :) [17:19:51] that's hard ;) [17:20:01] but vital [17:20:45] :) [17:20:51] Would be great if we had a nice way to test this. [17:21:04] I'd like to get advice on setting up better integration testing for both ORES and wiki labels. [17:21:11] Right now, I'm happy with revscoring. [17:21:33] * halfak --> lunch [17:44:08] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Hungarian Wikipedia - https://phabricator.wikimedia.org/T131446#2204144 (10Tgr) a:03Tgr Sorry for being unresponsive. I have been busy but will wrap this up over the weekend. [17:45:22] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2204147 (10Josve05a) Does ""Edit quality" mean "[to] edit quality" or "[the] edit[s] quality"? [17:57:52] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2204204 (10Halfak) The edit's quality. As in, "is it damaging?" and "was it saved in good-faith?" [17:58:26] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Hungarian Wikipedia - https://phabricator.wikimedia.org/T131446#2204205 (10Halfak) Thanks @tgr :)! [17:58:52] back. [18:26:33] halfak: It took a very long time because an error I made in enabling logging [18:26:41] but finally I fixed it, and tested it [18:26:56] I was able to add label, and/or rewrite one [18:27:36] FWIW I ran the uwsgi python file directly instead of daemon, their logging is totally useless [18:32:58] * Amir1 goes away to celebrate [19:01:40] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-Translate, 06translatewiki.net: qqq for a wiki-ai message cannot be loaded - https://phabricator.wikimedia.org/T132197#2204494 (10Ladsgroup) [19:09:06] 06Revision-Scoring-As-A-Service, 10wikilabels: Edit quality campaign for Swedish Wikipedia - https://phabricator.wikimedia.org/T131451#2204537 (10Josve05a) Redigeringskvalité (20 000 stickprov) or Redigeringskvalité (20 tusen stickprov) or something similarly. There isn't reaslly a great transation for these... [19:45:22] halfak: around? [19:45:28] Amir1, yeah. What's up? [19:45:53] Please read the above notes regarding wikilabels [19:47:32] Amir1, https://github.com/wiki-ai/wikilabels/commit/2c23a23ed41c166177e706d951de040e112fc7cf#commitcomment-17091471 [19:48:04] I think we want to not store a logger inside of DB. [19:48:39] This doesn't conform to the normal (common) use of the logging module. [19:49:25] okay [19:49:30] I'm on it [19:51:58] halfak: Should I add it to __init__ function or top of the code [19:52:47] I think the top of the code. [19:52:55] okay :) [19:53:11] If we want a specialized event logger, let's build the code up for that when we are ready to implement it. [19:53:24] Right now, events in the logs are great. [19:53:31] I think that, eventually, we'll want an event table. [19:59:19] halfak: ^ [19:59:47] Solid. [20:00:19] \o/ [21:15:17] wiki-ai/revscoring#661 (d_for_deleted - d9d9a10 : halfak): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/122900999 [21:18:39] halfak: I wanted to work on abandoning tasks but I feel sleepy, I will continue working on it tomorrow, you would have several PRs by then :) [21:19:07] Is there anything else you want me to do? [21:23:48] I found a new task, precaching :D [21:23:53] o/ [22:49:00] hm