[11:24:43] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2198404 (10Ladsgroup) Okay, I downloaded logs for the last week. It stopped after April 7th, or got really in control (number of timed out requests went to a... [12:13:31] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2149118 (10Chase) Please don't add me to this. I am not the Chase you're looking for. [12:15:18] 06Revision-Scoring-As-A-Service, 10wikilabels: DB performance improvements on wikilabels - https://phabricator.wikimedia.org/T132436#2198510 (10Ladsgroup) [12:18:40] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2149118 (10Ladsgroup) Sorry :( [12:21:47] 06Revision-Scoring-As-A-Service, 10wikilabels: DB performance improvements on wikilabels - https://phabricator.wikimedia.org/T132436#2198510 (10Ladsgroup) I checked and the second one actually generates 7886127 bytes (=7.5 MB) which means I definitely need to fix this. [12:37:41] 10Revision-Scoring-As-A-Service-Backlog: [document] Models in progress - https://phabricator.wikimedia.org/T132438#2198562 (10Halfak) [12:39:47] 10Revision-Scoring-As-A-Service-Backlog: [Discuss] Opportunities for crowdsorucing around ORES - https://phabricator.wikimedia.org/T132440#2198595 (10Halfak) [12:41:56] o/ [12:42:52] This morning, I'm going to make a multiprocessing-based scorer pool in revscoring. [12:42:59] This will help make analyses easier. [12:43:11] And would act as a supplement to the ORES api [12:43:17] (and I need it for stuff) [12:52:31] halfak: hey [12:52:38] Hey Amir1 [12:53:04] Today I looked into this bug: https://phabricator.wikimedia.org/T130872 [12:53:18] I realized several bugs in wikilabels [12:53:25] I'm trying to fix them :) [12:53:40] :) [12:58:26] halfak: I have some issues to discuss with you about the wikilabels [12:58:29] do you have a min? [12:58:35] yup [12:58:57] 06Revision-Scoring-As-A-Service, 10wikilabels: DB performance improvements on wikilabels - https://phabricator.wikimedia.org/T132436#2198673 (10Ladsgroup) Okay the second one is actually being requested by editquality fetch label utility. So, Not a very big deal :D I might need to compress it or remove pretty... [12:59:04] https://phabricator.wikimedia.org/T132436 [12:59:12] first starts with this [12:59:28] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2198675 (10Chase) >>! In T130872#2198534, @Ladsgroup wrote: > Sorry :( NP [12:59:53] the second request is 7.5 MB [13:00:10] takes about 3 seconds of a worker just to generate, not to mention I/O [13:00:40] Amir1, this request does not assign a workset: https://labels.wmflabs.org/campaigns/enwiki/4/?tasks= [13:00:52] we only request this via edit quality fetch label utility (at first I thought it's assign workset) [13:01:23] because the request is being made it invokes assign_workset function [13:01:26] it confused me [13:01:32] Indeed. this is our primary way of providing access to all data related to an entire campaign. [13:01:52] yeah [13:02:04] I was thinking if we can remove the pretty printing [13:02:09] or at least gzip it [13:02:22] assign_workset only gets called when "assign" is in the args. https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/wsgi/routes/campaigns.py#L56 [13:02:25] (which is a common practice, mediawiki gzips js) [13:02:50] Amir1, ^ this will increase the time that our worker spends. [13:03:05] but we can do it https://flask-compress.readthedocs.org/en/latest/ [13:04:04] hmm [13:04:22] what do you think of using "not pretty printing"? [13:04:58] Amir1, I'm not sure it will get us much. We'll want a nice way to allow the user to turn pretty printing back on [13:05:11] It certainly won't save us CPU time. [13:05:16] Why the concern for bandwidth? [13:05:50] It's easily to turn it back on [13:06:00] but I haven't found a way to j [13:06:06] *fix it [13:06:15] but it's not a big deal [13:06:21] since we do it once a while [13:06:29] it's not part of our UI [13:07:04] https://phabricator.wikimedia.org/T130872 [13:07:06] halfak: ^ [13:07:14] this is the second one [13:07:35] second what? [13:07:36] It seems our issue was happening due to the shared db being overoaded [13:07:41] Oh yeah. Saw that [13:07:44] second issue to talk about [13:07:49] That's what I suspected. [13:08:09] So, it'll be good to do an analysis of the stat-related queries. [13:08:23] so my guess is we can move to our own db [13:08:31] but I don't like it [13:08:50] or cross our fingers and hope labs people get to this soon [13:09:07] I think our own DB will be a good probing proposal. [13:09:24] Would be nice to raise the issue on -labs and see what chase & Yuvi think. [13:09:43] sure [13:10:41] the last issue I need to talk about is that all of JS requests (such as changing labels, etc.) are being made using GET method [13:10:51] All requests in logs are GET [13:11:18] but in the source code some parts are actually POST [13:11:35] Amir1, probably because the JS on wikipedia uses JSONp [13:11:36] e.g. label_task [13:12:30] I disagree, I think it's because we haven't changed method in jquery.Ajax in server.js [13:12:39] http://api.jquery.com/jquery.ajax/ [13:13:13] the default method is GET and in case we are using different method we should explicitly use it [13:14:41] obviously I can be wrong [13:15:00] https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/wsgi/static/js/wikiLabels/server.js#L10 [13:15:06] ^ jsonp [13:15:25] You can't POST with jsonp [13:15:36] hmm [13:15:37] We need CORS set up to use regular AJAX [13:15:43] Okay [13:15:57] so we need to remove POST from our uwsgi files [13:16:28] Negative, I think we should go in the other direction [13:16:34] jsonp was meant to be temporary [13:17:19] https://github.com/wikimedia/operations-puppet/blob/production/modules/role/templates/ores/lb.nginx.erb#L22 [13:17:23] ^ CORS for ORES [13:18:12] OK. So 1- we need to enable CORS [13:18:27] 2- Remove jsonp and use POSt method? [13:18:37] +1 [13:19:22] OK [13:20:47] I probably need to work on stats. https://labels.wmflabs.org/campaigns/fawiki/?campaigns=stats this is a little bit slow comparing to other requests [13:21:53] but not a big deal [13:33:14] halfak: About ru in damaging what do you think if I use bash commands to merge them? [13:33:20] like paste [13:33:25] +1 [13:33:31] That's what I'd do [15:38:29] * YuviPanda waves [15:43:00] o/ [15:45:18] YuviPanda, having DB issues with wikilabels and the shared postgres [15:45:28] We're considering running a postgres within the wikilabels project [15:45:35] Wanted your sanity check. [15:45:56] c.f. https://phabricator.wikimedia.org/T130872#2198404 [15:46:02] BRB dog needs to go out [15:46:29] 06Revision-Scoring-As-A-Service, 10wikilabels: [Investigate] Intermittent performance issues with wikilabels - https://phabricator.wikimedia.org/T130872#2199287 (10yuvipanda) Adding @akosiaris who does our postgres setup [15:47:09] halfak: the problem with doing your own postgres would be that you're then responsible for backups and restores and stuff. It does provide you more isolation though. I've added akosiaris to the ticket, who takes care of our postgres installs. let's see what happens [15:47:37] halfak: unfortunate as it might sound, in our infrastructure mysql has far more support in terms of people who know what's going on and how to fix things, so that's another option maybe. [15:56:28] so in order of least long-term headaches: 1. move to mysql, use toolsdb, 2. Fixup the shared postgres db we have, figure out who is hammering it, 3. setup and maintain your own per-project postgres setup. (3) is probably harder than it looks long term. [15:56:49] * YuviPanda goes afk to shower and food [15:59:28] Thanks for the thoughts YuviPanda [16:03:47] Amir1, ^ [16:27:45] hey [16:27:49] I was heading home [16:27:52] I just got back [16:28:07] halfak: ^ [16:28:46] See yuvi's notes above. I've been assuming we get no backups from the current postgres. [16:29:02] Either way, we should probably have a robust strategy for backups in wikilabels [16:29:27] thanks YuviPanda :) [16:29:35] MySQL won't be *bad* since we can store JSON in a text blob, but it would be a bit sad to go that direction. [16:29:55] halfak: A question that might sound silly [16:30:00] It's nice that we can run queries with JSON fields in postgres, but I don't use that much. [16:30:01] indeed, I'm saddened every time I have to use mysql rather than postgres. But I'm sad a lot so it's ok :) [16:30:04] why storing JSON text blob [16:30:24] it might be perfromancially expensive [16:30:38] We have JSON to store. Postgres understands JSON, but MySQL does not. [16:30:46] Oh! Why are labels arbitrary JSON? [16:30:52] yeah [16:30:57] Because that's a really nice bit of flexibility. [16:30:59] using other stuff [16:31:21] E.g the edit types label is a complex set of objects with sub-objects [16:31:22] yeah I can see but it has its own downfalls [16:33:15] anyway, we can use index [16:33:35] I need to study how to reduce its downsides [16:33:50] halfak: https://mariadb.com/kb/en/mariadb/column_json/ :) [16:34:13] it would be quite an interesting challenge [16:34:35] afk for a min [16:34:49] dynamic column... [16:35:15] I don't actually know what it all means, just have heard that mariadb is getting nice json support [16:35:56] Gotcha [17:02:05] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 07Puppet, 03Scap3: deployment-((sca|aqs)01|ores-web) fails due to scap3 errors - https://phabricator.wikimedia.org/T132267#2199660 (10Ladsgroup) [17:02:11] halfak: ^ [17:10:46] * halfak feels like this setup is fragile [17:14:46] * halfak --> lunch [17:22:00] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 07Puppet, 03Scap3: Puppet on deployment-((sca|aqs)01|ores-web) fails due to scap3 errors - https://phabricator.wikimedia.org/T132267#2199720 (10Krenair) [19:06:47] halfak: got a fix for this: https://gerrit.wikimedia.org/r/#/c/282992 [19:06:55] it will works for us [19:08:08] It was just a matter of getting out of that directory? [19:08:47] yup [19:09:04] puppet agent in your home folder doesn't work [19:09:10] but everywhere else it works [19:09:14] isn't it crazy [19:10:09] back to the wikilabels work [19:12:42] Ahh! A change that I made for hindi broke all of our old models and I have been chasing this bug all day! [19:13:06] * halfak hates pickle [19:15:05] :( [19:15:36] I was actually working on building ruwiki damaging [19:15:50] one thing halfak , do you need to rebuild all models again? [19:16:09] Yes. But I'm going to kill the linear SVCs :D [19:17:31] * halfak increments revscoring to 1.2.0 [19:22:48] wiki-ai/revscoring#653 (fast_scoring - a0fe282 : halfak): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/122602059 [19:23:02] Shuddap [20:12:44] 06Revision-Scoring-As-A-Service, 10wikilabels: DB performance improvements on wikilabels - https://phabricator.wikimedia.org/T132436#2200705 (10Ladsgroup) [[https://github.com/wiki-ai/wikilabels/pull/107|This PR]] would solve the first performance issue, I must note that first request is about 100 times slower... [20:12:45] 10[1] 04https://meta.wikimedia.org/wiki/https://github.com/wiki%2Dai/wikilabels/pull/107 [20:14:51] halfak: I wanted to build ru damaging but since you're regenerating the whole models again, I thought I wait until you're done and then I continue my work [20:15:01] so I made this ^ [20:17:01] Amir1, would be nice to see if this changes the query plan. [20:17:22] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet, 03Scap3: Puppet on deployment-((sca|aqs)01|ores-web) fails due to scap3 errors - https://phabricator.wikimedia.org/T132267#2200743 (10Ladsgroup) I cherry-picked this patch into the beta puppetmaster. So the issu... [20:18:04] halfak: do you mean response time? [20:18:55] No. Query plan. Postgres will tell you what indexes it uses or not [20:19:24] never heard of it [20:19:29] let me do some googling [20:29:34] Amir1, look for "Explain" [20:29:41] oh [20:29:47] I got the query plan already [20:29:48] You run it like this "EXPLAIN " [20:29:51] :D [20:29:57] Sorry was distracted [20:30:00] I'm checking if the index is there or not [20:30:12] no problem, It was mine to learn [20:30:16] https://www.irccloud.com/pastebin/oycfrkc0/ [20:32:18] that's without index [20:32:23] let me add index [20:33:15] Oh! That query is going to help a lot :) [20:33:18] *index [20:33:19] lol [20:33:30] I didn't realize it was this query. [20:33:40] yeah [20:33:41] I suppose we still only have < 100 campaigns [20:33:45] So scanning them is fast [20:33:52] but let me get that [20:33:56] +1 [20:34:00] It seems it is problematic [20:36:45] that's strange [20:36:53] it gives me syntax error [20:38:55] halfak: CREATE INDEX IF NOT EXISTS does not work in postgresql prior to 9.5 [20:39:06] mine is 9.3 [20:39:16] Arg! [20:41:11] https://github.com/wikimedia/operations-puppet/blob/3218df65dcc4c9d42ce6deef0e130db817613f58/modules/postgresql/manifests/postgis.pp [20:41:27] c'mon [20:41:35] I need to use another option [20:42:51] http://dba.stackexchange.com/questions/35616/create-index-if-it-does-not-exist [20:43:46] \o/ fast scoring works! [20:45:41] halfak: ^ [20:45:45] what do you think of this? [20:46:01] I followed all other index making parts [20:46:08] (I tested it, worked) [20:48:43] halfak: I still when review your code wants to say, why aren't you using staticmethods then I realize the pickle issue :D [20:49:26] :P [20:49:37] * halfak curses pickle [20:52:51] Amir1, this score utility now has two types of concurrency built-in [20:53:19] there's a threaded executor for querying the API and a process-based executor for doing feature extraction and model scoring. [20:53:47] It'll help for when we want to experiment with models without standing up an ORES. [20:54:13] It essentially implements an pocket version of ORES that you run on your local machine -- that'll take advantage of available resources. :) [20:55:00] \o/ [20:55:44] * halfak is responding to your notes while you merge :P [20:56:26] \o/ [21:00:48] {{merged}} [21:00:48] 10[2] 04https://meta.wikimedia.org/wiki/Template:merged [21:00:54] I ran tests on my local postgres. [21:07:05] \o/ [21:07:07] thanks [21:15:24] o/ [21:42:40] o/ [23:49:39] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-articlequality : NLP for article quality models. - https://phabricator.wikimedia.org/T132533#2201702 (10Halfak)