[17:29:42] o/ Amir1 [17:29:52] still babysitting model building. So many mistakes in the Makefile :( [17:30:33] This was good cause to clean them up. [17:41:42] Darn. it looks like we might miss this deployment window. [17:41:52] I want this issue resolved. I think we should look into clearing the cache. [17:41:56] akosiaris, are you around? [17:47:21] halfak: hey, we don't have deployment window in Friday [17:47:29] Arg. Gotcha. [17:47:31] (sorry, I was afk coming from university to home) [17:47:41] This is painful, but I don't think there's much we can do without a window [17:47:46] Looks like users are noticing the issue [17:47:50] that's one of the things that bothers me a little [17:48:18] I was hoping that whatever cache prevented us from noticing it in beta was also going to prevent people from noticing in prod [17:48:32] Looks like our key UI (ORES review tool) is doing just fine. [17:48:35] halfak: we can do a fast deployment if it's urgent outside of window [17:49:48] we can do a deployment just to fix this and then later get everything working [17:49:52] what do you think? [17:51:53] Amir1, I'd like to do that. How much social capital would we burn? [17:51:54] https://gerrit.wikimedia.org/r/#/c/304489/ [17:52:01] ^ implements cache storage fix [17:52:20] if we make ores down a lot, otherwise none [17:53:13] {{merged}} [17:54:08] Hmm... I'm going to test on labs quick [17:54:12] Can you check out beta [17:54:21] We really need someone to blow out our cache in prod [17:54:36] halfak: sure (on beta) [17:54:57] on blowing out cache in prod. We need alex (or one of the Ops) [17:55:16] OK. Looks good on ores-staging in labs [17:58:58] Confirmed that the cache issue no longer exists. Moving to main labs deplou [17:58:59] * [18:01:19] halfak: do we have a phab card for the bug? [18:01:31] No. Good Q. We should. I'll make it right now [18:03:34] awesome [18:29:08] halfak: it's live in beta. Test there please :) [18:39:28] ORES extension jobs work there [18:41:26] Great. Still haven't got to testing. I have some distractions. I'll examine the situation there in a moment. [18:41:34] Did you clear the score cache? [18:41:57] nope [18:42:04] let me do it [18:45:56] halfak: Have you made the phab card. I want to send an email to ai-l explaining we will have a deployment today [18:48:24] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548655 (10Halfak) [18:48:29] https://phabricator.wikimedia.org/T142857 [18:48:32] Amir1, ^ [18:48:34] Adding notes [18:49:00] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548672 (10Halfak) Fix for ores: https://github.com/wiki-ai/ores/pull/164 [18:49:20] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548673 (10Halfak) This is now deployed to labs. The cache has been cleared to clean this up and the fix has been confirmed. [18:49:36] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548674 (10Halfak) p:05Triage>03Unbreak! [18:50:45] thanks. Sending the email right now [18:51:22] (03CR) 10Jforrester: "Maybe caused T142858 ?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [18:59:33] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Beta-Cluster-reproducible: ORES Beta Feature causes fatal on Special:Contributions in Beta Cluster - https://phabricator.wikimedia.org/T142858#2548722 (10Ladsgroup) a:03Ladsgroup [19:00:52] (03CR) 10Ladsgroup: "Yup, Working on it. It didn't happen in mw-revscoring.wmflabs.org. Strange." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/264608 (https://phabricator.wikimedia.org/T122537) (owner: 10Awight) [19:18:28] (03PS1) 10Ladsgroup: Fix internal error when score doesn't exist in the table in SpecialContribs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/304498 (https://phabricator.wikimedia.org/T142858) [19:20:37] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Beta-Cluster-reproducible, 13Patch-For-Review: ORES Beta Feature causes fatal on Special:Contributions in Beta Cluster - https://phabricator.wikimedia.org/T142858#2548756 (10Ladsgroup) First glance, It happens when ores score doesn't exist for... [19:26:08] afk for a fast lunch [19:26:30] kk Amir1 [19:35:17] back [19:35:25] 9 minutes :D [19:35:36] Wow [19:35:44] Do you want to try the deployment now? [19:35:50] I'll head out to lunch in 55 mins [19:35:57] Oh! We still need someone from ops [19:36:01] yup [19:36:22] Do you want to deploy? [19:36:36] So, I think we should do the deployment and verify that nothing is broken. Then we should clear the ORES caches. [19:36:42] OK. [19:36:47] I'm fixing a UBN! task in the ores extension. I can supervise [19:36:59] UBN! task? [19:37:06] https://phabricator.wikimedia.org/T142858 [19:37:37] i think the extension has some issues with the flow extension [19:37:49] I made the patch but I need to confirm it in mw-revscoring.wmflabs.org [19:37:50] Amir1, is oresdb.eqiad.wmnet our only redis node? [19:38:15] we have two nodes if I'm correct. I remember the hardware request [19:38:20] but we can be sure [19:38:25] Yeah... This is the only listed one [19:38:47] oh it's oresrdb [19:38:50] not oresdb [19:41:00] sudo redis-cli -h deployment-ores-redis.deployment-prep.eqiad.wmflabs -p 6379 -a areallysecretpassword flushall [19:41:03] Oh.... Hmm.. It looks like we might be able to do this ourselves [19:41:09] No sudo necessary [19:41:11] that's how we clean cash in beta [19:41:24] yeah [19:41:27] it might be [19:41:36] can you login to that node? [19:41:59] (I'm waiting for vagrant to do the provision) [19:42:04] I don't need to log into the node with redis running [19:42:15] Looks like we have oresrdb1001 and oresrdb1002 [19:42:44] it might be that one of them is broker and one of them is cache [19:43:27] Seems to me there's replication happening here [19:43:39] E.g. I can connect to a redis server on 6380 and 6379 on both machines [20:19:37] halfak: I can't test that patch in mw-revscoring. Flow enables lots of dependencies that one of them doesn't work here :( [20:19:55] the really stupid way to test is to +2 and wait for beta [20:20:43] Not really all that crazy [20:20:44] Hmmm [20:21:53] I'm looking for James, if he can test it in an environment beforehand, it would be great [20:33:23] Amir1, check this out: https://grafana-admin.wikimedia.org/dashboard/db/ores?panelId=12&fullscreen [20:34:04] that looks normal [20:34:14] Are we sure we cleared the cache [20:35:17] Yup [20:35:20] I'm confident [20:35:34] Our most useful cache is immediately after a score is generated [20:36:03] You can see the dip when I cleared it at 20:12 [20:36:32] Goes to 0.006! [20:36:56] That's an order of magnitude lower than the last major dip [20:36:58] 0.04 [20:37:46] OK Time for me to get lunch [20:37:48] o/ [20:38:43] o/ [20:38:51] (03CR) 10Ladsgroup: [C: 032] Fix internal error when score doesn't exist in the table in SpecialContribs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/304498 (https://phabricator.wikimedia.org/T142858) (owner: 10Ladsgroup) [20:39:48] (03Merged) 10jenkins-bot: Fix internal error when score doesn't exist in the table in SpecialContribs [extensions/ORES] - 10https://gerrit.wikimedia.org/r/304498 (https://phabricator.wikimedia.org/T142858) (owner: 10Ladsgroup) [20:56:58] (03CR) 10Ladsgroup: "I tested it using this: https://gist.github.com/Ladsgroup/3f3efe2b6493b92b9a0b6bd76b5e1b18" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/304498 (https://phabricator.wikimedia.org/T142858) (owner: 10Ladsgroup) [20:59:12] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Beta-Cluster-reproducible, 13Patch-For-Review, 15User-Ladsgroup: ORES Beta Feature causes fatal on Special:Contributions in Beta Cluster - https://phabricator.wikimedia.org/T142858#2549361 (10Ladsgroup) With 304498 merged and deployed in bet... [20:59:28] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Beta-Cluster-reproducible, 13Patch-For-Review, 15User-Ladsgroup: ORES Beta Feature causes fatal on Special:Contributions in Beta Cluster - https://phabricator.wikimedia.org/T142858#2549362 (10Ladsgroup) 05Open>03Resolved [21:02:32] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548655 (10Ladsgroup) 05Open>03Resolved [21:04:00] 06Revision-Scoring-As-A-Service, 10ORES: ORES format issue (whole score document is cached) - https://phabricator.wikimedia.org/T142857#2548655 (10Ladsgroup) Resolving, We should break the habit of not closing phab tasks in case it's UBN! [22:07:33] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 07Beta-Cluster-reproducible, 15User-Ladsgroup, and 2 others: ORES Beta Feature causes fatal on Special:Contributions in Beta Cluster - https://phabricator.wikimedia.org/T142858#2549494 (10Jdforrester-WMF) Yup, thank you!