[14:01:25] o/ [14:01:27] Hey folks [14:13:25] 06Revision-Scoring-As-A-Service, 10ORES: ORES production blog post - https://phabricator.wikimedia.org/T141275#2511549 (10Halfak) [14:13:37] 06Revision-Scoring-As-A-Service, 10ORES: ORES vision blog post - https://phabricator.wikimedia.org/T141275#2492654 (10Halfak) [14:14:36] hey halfak, do you happen to know of any resources that detail getting all the plumbing working with enchant and adding dictionaries (specifically on OSX)? [14:16:21] 06Revision-Scoring-As-A-Service, 10ORES: ORES vision blog post - https://phabricator.wikimedia.org/T141275#2511558 (10Halfak) Talked to @jeffelder about this. We're looking at a higher-level framing. I pitched #ores the internally facing sensor network that allows Wikipedia to adapt. I think I can write an... [14:17:10] I don't know about OSX. [14:17:26] how about configuring enchant [14:18:13] I've never done that [14:19:56] http://eigenjoy.com/2009/11/11/install-enchant-dictionaries/ [14:19:57] ? [14:20:07] * halfak tries to google what schana is talking about [14:20:25] yeah, I've been down the google path [14:20:32] Woops. I have to run to a doc. appointment. Back in ~1 hour [14:42:10] halfak: o/ [14:58:01] I think I figured it out! The wheel for pyenchant on OSX bundles the enchant binary, which bypasses the one I installed via homebrew [15:30:00] o/ Amir1 [15:58:49] 10Revision-Scoring-As-A-Service-Backlog, 10revscoring, 10rsaas-editquality: Write proof-of-concept good-faith newcomer detection - https://phabricator.wikimedia.org/T141779#2511851 (10Halfak) [16:10:24] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Review changes to graphite/grafana for ORES refactor - https://phabricator.wikimedia.org/T141781#2511905 (10Halfak) [16:40:34] 06Revision-Scoring-As-A-Service, 10ORES, 13Patch-For-Review: Move from mediawiki/services/ores/deploy to research/ores/deploy or research/ores/deploy-prod - https://phabricator.wikimedia.org/T139008#2512071 (10Halfak) @akosiaris, can you take a look at this [16:52:13] 06Revision-Scoring-As-A-Service, 10revscoring: Tamil language utilities - https://phabricator.wikimedia.org/T134105#2512168 (10Halfak) [16:52:29] Hey schana. So, I see you have been working on the language utilities [16:52:40] Any progress on the wikilabels info field [16:52:50] ? [16:52:54] And I guess more importantly, would you like to continue work on these tasks. [16:52:57] ^? [16:53:31] I haven't gotten to those tasks quite yet - still working on getting revscoring environment up [16:53:41] (which is almost done, btw) [16:53:49] halfak: I need to go to home, be back in one hour. Is it okay? [16:54:10] great schana. [16:54:10] Did you get a new machine then? [16:54:16] yes [16:54:36] If you're not there by then. We can do it right now after you talk with schana [16:54:43] *your [16:55:18] Amir1, I'm just about to head to lunch [16:55:23] And then I'll likely bike home [16:55:31] I can be around in 1.5 hours or so [16:55:40] okay, see you then :) [16:55:51] o/ [18:21:32] Amir1, getting on my bike right now, so it'll be another 40 mins. [18:21:40] I got caught up writing email after lunch :/ [18:21:57] it's okay [18:22:06] :) [19:25:57] o/ Amir1 [19:26:12] hey, I'll be around in 5 min. [19:26:16] Just got to my desk. Got stuck talking about https://www.google.com/url?q=https%3A%2F%2Fmeta.wikimedia.org%2Fwiki%2FGrants%3AIdeaLab%2FProtect_user_space_by_default&sa=D&sntz=1&usg=AFQjCNEwRPXsj3KFG_gnUrxRd6PsWAMo6g [19:26:19] WTF google [19:26:25] https://meta.wikimedia.org/wiki/Grants:IdeaLab/Protect_user_space_by_default [19:30:38] halfak: hey [19:30:44] I'm back. What's up [19:32:19] halfak: this change technically is super simple but that requires community consensuses and that's a hard work [19:33:34] +1 [19:33:39] My assessment as well. [19:37:28] halfak: I wanted to know what should I do this week [19:40:21] Gotcha. Looking [19:40:45] Looks like you have a lot in Active already [19:40:54] What are our blockers for deploying to Enwiki and Plwiki? [19:41:17] The ores refactor [19:41:48] that's actually blocked by the number of workers [19:41:55] which is blocked by the ores refactor [19:42:26] but deploying to plwiki and enwiki is a half-an-hour work [19:42:32] Gotcha. [19:42:40] OK let's get that blocker out of the way. [19:42:54] for today, let's investigate the beta redis issue [19:42:56] Can you look at graphite/grafana in labs to make sure it's working iwth metrics collection? [19:43:07] We'll want to make sure changes get propagated to prod. [19:43:09] Oh yeah. That too [19:43:11] https://ores-beta.wmflabs.org/v1/scores/enwiki/damaging/?revids=113|114|115|116|117 [19:44:55] * halfak waits for timeout [19:45:14] What's the hostname of that server? [19:45:26] graphite.wmflabs.org [19:45:30] which redirects [19:46:27] huh [19:46:37] ores-beta host [19:46:59] for that? [19:47:09] deployment-tin.eqiad.wmflabs [19:47:19] hmm, good question. It should be in the graphite itself [19:47:30] I already deployed it, there is no need to deploy [19:47:36] we should check redis [19:47:38] I want to look at the redis issue [19:47:40] Yes [19:47:41] :P [19:47:58] https://www.irccloud.com/pastebin/i5iVrYCR/ [19:48:05] halfak: after very long time returns this [19:48:10] "BROKER_URL: redis://ores-redis-01:6379" [19:48:15] That's probably wrong. [19:48:22] Is the redis server localhost on deployment-tin? [19:48:23] it should be deployment-ores-redis [19:48:39] Gotcha. We need a custom config there. Looks like it was deleted or never made it. [19:48:40] we have a dedicated redis node there [19:49:01] maybe my change to config was not correct [19:49:25] probably we just need to make config file there [19:49:27] On it [19:49:51] halfak: I will get the beta issue fixed [19:52:26] oh, I found the issue [19:52:31] It's in puppet [19:52:36] fixing it right now [19:53:21] \o/ [20:02:57] halfak: okay, since the change is in prod. We need to deploy to prod immediately after the change being merged otherwise everything will break [20:03:12] https://gerrit.wikimedia.org/r/302303 [20:03:33] Oh... I guess we're ready to go with that, right? [20:03:45] yeah [20:03:55] OK. I'm in if you are. [20:04:08] I can stick around to monitor for the next 3 hours at least [20:04:10] yes, let me get someone from ops [20:10:16] * halfak starts poking at the "Feature Vector" problem [20:42:38] PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 2212 bytes in 0.044 second response time [20:42:49] halfak: https://ores.wikimedia.org/v2/scores/enwiki/?models=damaging&revids=724030089 [20:42:56] why? [20:43:26] the puppet configs are there [20:43:30] but it's not working [20:43:36] looking [20:44:08] I checked it in beta and it worked [20:44:22] There's no custom config on scb1001 [20:44:47] it's in /etc/ores [20:44:51] Oh yeah [20:44:53] Looking at that now [20:45:18] Can you revert while I start digging? [20:45:42] Revert doesn't work. I need someone to revert the puppet change [20:45:54] it might take some time [20:46:03] Crap [20:46:14] I'm here to to work at full capacity [20:46:58] OK. I've confirmed that we *should* be reading from /etc/ores/ [20:49:47] And now I've confirmed that it isn't working, so "should" has failed. [20:49:48] okay, it's time for plan b [20:49:57] what [20:49:59] ? [20:50:04] Copy-paste 99-main.yaml into config/ and restart [20:50:07] For now [20:50:52] I don't have access [20:51:34] halfak: do you have access to do it? [20:52:20] Nope [20:52:24] OK working hard on ORES [20:54:43] https://ores-beta.wmflabs.org/v1/scores/enwiki/damaging/?revids=113|114|115|116|117 [20:54:50] ores in beta works [20:54:52] Hmm... We definitely *are* reading from /etc/ores I was mistaken [20:54:55] yeah. [20:55:00] it should give me a clue [20:55:06] everything in beta is the same [20:55:39] I wonder if it's the celery queue check specifically. [20:55:44] Do we have a password on redis in beta? [20:56:03] nope [20:56:11] Aha! [20:56:15] OK looking into that [20:56:16] that's one of the differences [20:58:57] Oh! I can see it. [20:59:03] I thought we fixed this issue. [20:59:15] There's a colon before the password. [20:59:49] wtf [21:00:47] https://gerrit.wikimedia.org/r/#/c/302303/3/modules/ores/manifests/web.pp [21:01:00] it has been there and I think it's the way alex wanted [21:01:13] halfak: https://gerrit.wikimedia.org/r/#/c/302303/3/modules/ores/manifests/web.pp [21:01:16] ORES is updated [21:01:27] I just pushed a commit to master [21:02:01] Looks like I probably messed that up while rebasing. :\ [21:02:02] Okay, fixing it and deploying [21:03:12] this diffusion is not updated [21:03:18] Arg [21:03:20] A delay? [21:03:43] yup [21:04:10] two minutes have passed [21:04:17] what the [21:07:11] https://phabricator.wikimedia.org/diffusion/1912/browse/master/;ebe44624769533363310bb94b848a3c249dc4491 [21:07:59] okay, halfak. I make a patch [21:22:56] RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 730 bytes in 2.585 second response time [21:23:25] Nice! That got it [21:23:30] * halfak punches self in face [21:23:44] ^ that literally happened too. I had it coming. [21:43:14] https://github.com/wiki-ai/ores/pull/161 [21:55:16] PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 4012 bytes in 0.058 second response time [21:55:39] uh oh [21:56:18] this one is either super later or a cluster fuck [21:56:22] *late [21:56:29] I can take that anymore [21:56:31] :D [21:56:40] I think it's late [21:56:43] Seems to be OK now [21:57:24] I see the error now [21:57:44] one of nodes is down [21:58:17] https://ores.wikimedia.org/v1/scores/enwiki/damaging/?revids=113|114|115|116|117 [21:58:23] try that several times [21:58:49] can't get an error [21:59:17] RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 685 bytes in 0.578 second response time [21:59:25] Must have been a late puppet run? [21:59:45] I just restarted uwsgi in scb1002 [21:59:48] that might be it [22:01:00] halfak: it's okay now [22:01:07] it was the scb1001 [22:02:28] halfak: https://grafana.wikimedia.org/dashboard/db/ores-extension [22:02:39] this time number of failed jobs really went down [22:07:24] * halfak whistles [22:30:30] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Set up password on ORES Beta redis server - https://phabricator.wikimedia.org/T141823#2513386 (10Halfak) [22:31:52] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES, 10ORES: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2513412 (10Halfak) [22:32:06] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES, 10ORES: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2513426 (10Halfak) p:05Triage>03High [22:52:07] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 07Wikimedia-Incident: Set up password on ORES Beta redis server - https://phabricator.wikimedia.org/T141823#2513514 (10greg) [22:52:30] 10Revision-Scoring-As-A-Service-Backlog, 10MediaWiki-extensions-ORES, 10ORES, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2513515 (10greg) [22:52:37] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 07Wikimedia-Incident: Set up password on ORES Beta redis server - https://phabricator.wikimedia.org/T141823#2513386 (10greg) p:05High>03Triage [22:53:13] 10Revision-Scoring-As-A-Service-Backlog, 10ORES, 07Wikimedia-Incident: Set up password on ORES Beta redis server - https://phabricator.wikimedia.org/T141823#2513386 (10greg) p:05Triage>03High (bad drag)