[00:55:18] * halfak is having a lot of fun with this refactor. [00:55:22] I think I just got traction :) [00:56:30] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review: [Investigate] Precaching goes down sometimes - https://phabricator.wikimedia.org/T135444#2303665 (10Halfak) I don't think that is right. I am pretty sure that precaching is going down since we get all of the performance issues related to precachin... [03:10:40] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-editquality: [Spike] Proof of concept damage detection with hash vectors - https://phabricator.wikimedia.org/T132581#2303817 (10Sabya) @JustinOrmont, @Halfak: I have few questions: **Stacking:** Any idea how to make progress on this? **Pick hand created featur... [10:22:14] 06Revision-Scoring-As-A-Service, 10bwds, 10revscoring: Gather language assets for Swedish - https://phabricator.wikimedia.org/T131450#2304349 (10Ladsgroup) [10:31:24] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : Train a `reverted` model for svwiki - https://phabricator.wikimedia.org/T135604#2304363 (10Ladsgroup) [10:32:50] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review: [Investigate] Precaching goes down sometimes - https://phabricator.wikimedia.org/T135444#2304379 (10Ladsgroup) hmm, Strangely I can find requests being made and completed in logs (both in uwsgi and precaching logs) during those times. Maybe my timi... [13:02:23] 06Revision-Scoring-As-A-Service, 10ORES: Inconsistent JSON structure in ORES responses - https://phabricator.wikimedia.org/T131615#2304829 (10Halfak) [13:02:41] 06Revision-Scoring-As-A-Service, 10ORES: Inconsistent JSON structure in ORES responses - https://phabricator.wikimedia.org/T131615#2172680 (10Halfak) 05Open>03Resolved [13:03:53] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [discuss] What to do with vagrant? - https://phabricator.wikimedia.org/T135623#2304834 (10Halfak) [13:06:16] halfak: o/ [13:06:31] o/ Amir1 [13:06:39] hey [13:07:00] I'm investigating the logs [13:07:53] for two things: 1- hourly spike of errored scores 2- precaching going down [13:08:24] I couldn't find anything and it took two days of mine already [13:08:31] I think I need to check statsd logs [13:10:03] at the very least I can see what are the model, revision, etc. [13:17:06] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [spike] Find out if we can still get health check warnings after lb rebalance - https://phabricator.wikimedia.org/T134782#2304876 (10schana) a:03schana [13:18:01] Amir1, I replied to your thoughts re. precaching. [13:18:15] I don't think it is a metrics collection issue. [13:18:25] halfak: and I answered there [13:18:38] logs show precaching requests [13:18:44] you can find them in uwsgi logs [13:18:53] (and the precaching logs itself) [13:18:54] Weird. That doesn't make sense. [13:19:04] FWIW, the precacher would periodically go down before it was a service [13:19:05] I thought maybe my timing is not right [13:19:14] And it was clear that it did in fact go down. [13:19:29] halfak: any error? [13:19:42] Nope. Logs from precached just stopped showing up in stderr [13:19:45] maybe I can fix the underlying issue [13:19:51] Like the event look just got locked or something. [13:20:34] hmm, okay [13:20:44] it's super weird [13:21:43] *event loop [13:23:04] 06Revision-Scoring-As-A-Service, 06Research-and-Data, 10Research-management, 06WMF-NDA-Requests: NDA for Amir Sarabadani - https://phabricator.wikimedia.org/T134651#2304881 (10Halfak) @Aklapper, just checking in. Is there anything that I need to do to help this request move forward? [13:37:47] 06Revision-Scoring-As-A-Service, 10Wikimania-Hackathon-2016, 10bwds: Generate bad words for all languages more than 100K articles - https://phabricator.wikimedia.org/T134629#2304936 (10Ladsgroup) [13:48:15] brb [14:23:40] * halfak continues ORES refactor [14:24:10] I just realized that, when processing multiple models at the same time, we need to be clever with how we access the cache. [14:24:56] I hate needing to be clever. [14:25:06] Clever smells a lot like complicated [14:43:50] halfak: Amir1: https://gerrit.wikimedia.org/r/#/c/288618/2 can I get a review of this ? [14:44:58] akosiaris, not clear how # of processes will be set [14:45:35] (I just commented the same thing) [14:45:40] halfak: it defaults to $::processorcount [14:45:46] akosiaris, that's not right [14:45:53] We need to be higher than that by a lot [14:45:53] but we can pass $no_workers to service::uwsgi [14:46:15] halfak: that actually depends on the underlying hardware, no ? [14:46:34] akosiaris, well, yes, but uwsgi will be a bottleneck if the process count isn't very high. [14:46:43] akosiaris: I think for labs we tested it around and decided on this a few months back [14:46:43] Since the worker count in the celery cluster *is* high [14:47:09] anyway we can pass the value you got there to $no_workers as is [14:47:10] the uwsgi processes are also all io so it makes sense I think [14:47:28] I'm looking at this right now [14:47:33] https://gerrit.wikimedia.org/r/#/c/288613/5/modules/service/manifests/uwsgi.pp [14:49:31] I will also need to pass one more change btw. That is moving to /srv/deployment/ores from /srv/ores if that's ok guys. working on finalizing this now [14:50:46] akosiaris: per scap configs, this should be /srv/ores/deploy [14:50:54] or I'm wrong [14:51:10] Amir1: that is the current state, yes [14:51:20] in tin, it is /srv/deployment/ores/deploy [14:51:20] but I want it /srv/deployment/ores/deploy [14:51:25] I just did that [14:51:35] okay [14:51:48] I think that would be possible in scap.cfg files [14:54:43] ok, I 'll needs changes in a couple of places but I 'll post a patch soon [14:55:39] cool [14:55:58] akosiaris: tell me or halfak if you need anything [14:56:02] thanks :) [14:56:34] thanks as well [16:01:33] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-articlequality : [Explore] Spam and Vandalism new page creation - https://phabricator.wikimedia.org/T135644#2305525 (10Halfak) [16:21:04] akosiaris: I haven't tested it https://gerrit.wikimedia.org/r/#/c/289445/ [16:21:15] I need to find an nginx test setup [16:21:36] + we need to fix this in staging setups as well [16:27:39] Amir1: commented [16:27:53] thanks :) [16:30:46] YuviPanda: made the new changes [16:31:58] Amir1: let's take out the permanent right now so we can test without polluting too many caches, and once we are confident it works fine we can put it back in [16:41:02] YuviPanda: sure [16:41:33] (sorry, I have trouble connecting anything non-http right now) [16:41:51] I hope people in charge of the Iranian firewall will burn in hell [16:44:19] * YuviPanda provides hugs to Amir1 [16:44:30] thanks :) [16:44:58] If it's any consolation, I regularly had to fight with UMN IT about keeping IRC ports open [16:45:30] YuviPanda: now permanent is out: https://gerrit.wikimedia.org/r/#/c/289445/3/modules/role/templates/ores/lb.nginx.erb [16:46:36] Amir1: ok, I'm going to merge now, can you run puppet on the lb and verify? [16:46:46] yup [16:46:55] ok merging [16:47:38] Amir1: I merged, you can run puppet in like a minute [16:47:45] sure [16:47:47] thanks [16:47:53] right now, I'm trying to connect [16:47:59] yup connected [16:49:32] YuviPanda: it went smoothly [16:51:19] the redirect works as expected but I can't say it's because of nginx or uwsgi [16:51:40] Amir1: if it continues to work we're all good I think [16:51:53] \o/ [16:51:55] thanks YuviPanda :) [16:52:16] halfak: also, We lose the SSL redirect functionality for staging setups, do you need this to be fixed? [16:52:28] Not sure I understand [16:53:01] What should happen when I go to http://ores-staging.wmflabs.org? [16:53:08] And https://ores-staging.wmflabs.org? [16:54:20] https:// works as expected [16:54:26] but http:// won't [16:54:42] Oh! That's fine, I think. [16:55:14] Amir1: if you make a patch with permanent on, I can merge that too whenever. no hurry tho [16:55:30] Amir1: actually we can test easily if it is nginx or uwsgi - nginx provides non-permanent redirects, uwsgi provides permanent ones [16:55:46] neat [16:56:27] YuviPanda: http://www.redirect-checker.org/index.php [16:56:38] nice :) [16:56:40] https://www.irccloud.com/pastebin/Pjhj1iIo/ [16:56:42] you can also just do curl -I [16:56:54] nice Amir1 [16:56:56] so we know nginx works [16:57:11] \o/ [16:57:27] YuviPanda: should we move on to 301 now? [16:57:36] or wait for a while [16:57:48] Amir1: yeah, adding the permanent moves it to 301 [16:58:32] yeah [16:58:33] okay [17:02:24] YuviPanda: https://gerrit.wikimedia.org/r/289455 [17:03:02] 06Revision-Scoring-As-A-Service, 10ORES: Move SSL redirect from uwsgi to nginx - https://phabricator.wikimedia.org/T135655#2305882 (10Ladsgroup) [17:13:36] Amir1, something I've been considering when refactoring... See line 83 here: https://etherpad.wikimedia.org/p/ores_response_structure [17:14:06] When we re-use a dependency cache between models for the same rev_id, it makes sense to put models /inside of/ a [17:15:34] This would imply a different URL structure too [17:15:53] e.g. /scores/// [17:16:18] As opposed to the current /scores/// [17:17:10] Seems like a substantial change, I know, but I'm just doing so much backward work to replicate our current structures *and* handle multiple model-scorings for the same rev_id. [17:17:39] BRB [17:22:33] I was coffee break [17:24:20] halfak: "/scores////" sounds reasonable [17:24:37] plus we can do v3 [17:43:35] Was thinking the same. [17:43:42] Will start experimenting with that. [17:55:15] halfak: Should we deploy wikilabels and wait for a while OR I remove all jsonp and GET write actions first [17:56:49] Amir1: back, i'm merging https://gerrit.wikimedia.org/r/#/c/289455/ soon [17:57:04] YuviPanda: thanks \o/ [17:57:11] Amir1: merged [17:57:23] yay [17:57:35] should I run puppet agent for this too [17:57:42] and check if it's working? [18:00:51] Amir1: might be best yeah [18:01:01] kk [18:01:08] wait a sec [18:03:32] https://www.irccloud.com/pastebin/VQwXnxdH/ [18:03:35] YuviPanda: ^ [18:05:12] Amir1: \o/ [18:05:15] Amir1: cool then :D [18:05:49] so now we can move on to the rest of ores settings [18:06:52] Amir1: yup [18:07:24] YuviPanda: right now I'm working on scap3 settings [18:07:28] we need to do more now [19:59:53] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure: ores-beta is down - https://phabricator.wikimedia.org/T135677#2306754 (10Ladsgroup) [20:00:25] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure: ores-beta is down - https://phabricator.wikimedia.org/T135677#2306768 (10Ladsgroup) Super sleepy now but I'll take a look at it soon-ish [20:01:14] akosiaris: if you're around. Every path config is now changed to /srv/deployment/ores/deploy/ [20:01:29] I was able to deploy successfully via scap3 [20:01:49] and now I'm looking into the issue in the uwsgi itself [20:02:11] Amir1: wow, you 're quick [20:02:12] thanks [20:03:22] akosiaris: the last two commits [20:03:22] https://github.com/wiki-ai/ores-wikimedia-config/commits?author=Ladsgroup [20:03:52] also the latest patch set in https://gerrit.wikimedia.org/r/#/c/280403/ [20:04:07] ok, I 'll update the phab deploy repo then [20:04:42] * akosiaris is not convinced yet phab was the best path but anyway. Hopefully we will be future proof [20:05:09] awesome [20:05:12] thanks :) [20:06:40] halfak: do you have any idea why this is happening? https://phabricator.wikimedia.org/T135677 [20:06:47] Amir1: ok updated. https://phabricator.wikimedia.org/diffusion/1880/ [20:06:59] \o/ [20:07:07] thanks akosiaris! [20:07:47] Amir1, where is the real venv that should be being used? [20:08:18] halfak: /srv/ores/venv [20:08:40] OK. Seems to have the right version of revscoring installed [20:08:57] Where is the ores_web being executed from? [20:09:23] /srv/deployment/ores/deploy/ [20:09:37] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure: ores-beta is down - https://phabricator.wikimedia.org/T135677#2306754 (10Halfak) ``` halfak@deployment-ores-web:/srv$ source ores/venv/bin/activate (venv)halfak@deployment-ores-web:/srv$ python Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4... [20:12:11] 06Revision-Scoring-As-A-Service, 10Beta-Cluster-Infrastructure: ores-beta is down - https://phabricator.wikimedia.org/T135677#2306792 (10Halfak) ``` (venv)halfak@deployment-ores-web:/srv/deployment/ores/deploy$ python Python 3.4.2 (default, Oct 8 2014, 10:45:20) [GCC 4.9.1] on linux Type "help", "copyright",... [20:13:21] Amir1, my quess is it's not using the right venv. [20:13:34] let me check [20:15:23] halfak: in ores-web.ini [20:15:29] venv=/srv/ores/venv [20:17:15] halfak: a service restart seems to solve the issue [20:17:15] Amir1, are you sure you are looking at the most recent error? [20:17:23] yup [20:17:33] Hmm... That shouldn't happen. [20:17:33] they have time [20:17:44] Something must have changed. [20:17:46] and it's 19:50 [20:18:30] halfak: it's up now [20:18:31] https://ores-beta.wmflabs.org/ [20:18:36] only a service restart [20:18:48] I think that's because scap doesn't restart the service properly [20:18:54] I will dig into this later [20:19:53] (or restart too soon before the deploy gets completed) [20:20:01] That could be it. [20:43:25] 10Revision-Scoring-As-A-Service-Backlog: Generate recent article quality scores for English Wikipedia - https://phabricator.wikimedia.org/T135684#2306938 (10Halfak) [20:43:36] 10Revision-Scoring-As-A-Service-Backlog, 10rsaas-articlequality : Generate recent article quality scores for English Wikipedia - https://phabricator.wikimedia.org/T135684#2306951 (10Halfak) [21:17:32] going to sleep [21:17:56] halfak: I'll be back in a few hours, then I start working on the to-do list you gave me [21:19:35] o/ [22:07:59] Great Amir1. I'm just leaving for Chicago [22:08:15] I'll be offline for the rest of the day (most likely). Tomorrow is a talk-day for me, so I'll be totally offline. [22:08:43] I fly back on Friday and will be back to normal at ~1930 UTC [22:09:24] halfak, have a nice trip (: