[05:52:28] 06Revision-Scoring-As-A-Service, 10Wikimania-Hackathon-2016, 10bwds: Generate bad words for all languages more than 100K articles - https://phabricator.wikimedia.org/T134629#2329026 (10Psychoslave) Ok, I'll see that. Probably not this week though. Just glancing it, I see that there are (as far as I can tell)... [07:15:23] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should act based on alphabetical order of checks - https://phabricator.wikimedia.org/T136253#2328523 (10mobrovac) I'm strongly opposing this. Having Scap3 reorder the checks in whichever way means that its users must have intimate knowledge of the system i... [08:09:15] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should act based on alphabetical order of checks - https://phabricator.wikimedia.org/T136253#2329179 (10Ladsgroup) >>! In T136253#2329091, @mobrovac wrote: > I'm strongly opposing this. Having Scap3 reorder the checks in whichever way means that its users... [08:20:07] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should act based on alphabetical order of checks - https://phabricator.wikimedia.org/T136253#2329211 (10mobrovac) >>! In T136253#2329179, @Ladsgroup wrote: > Scap3 reorder checks randomly (since it turn the yaml file to a dictionary and order in dictionari... [09:17:09] o/ Amir1 [09:17:11] We have a problem [09:17:20] halfak_: hey [09:17:21] ORES fails when requesting multiple revision IDs [09:17:23] what's up [09:17:27] Cannot score more than one at a time [09:17:30] right now? [09:17:33] e.g. https://ores.wmflabs.org/v2/scores/enwiki/damaging/?revids=216123|32423423|3243242|234324 [09:17:37] yes [09:17:39] 500 error [09:17:43] I think it's an old bug in revscoring. [09:17:50] So if we deploy 1.2.6 I think it'll be fixed. [09:17:57] I'm looking into it now. Trying to get the error reported. [09:18:27] okay [09:18:38] halfak_: if you're busy I can do it [09:19:02] What wheel are do we have deployed right now for `revscoring`? [09:20:04] 1.2.4 [09:20:49] halfak_: ^ [09:21:40] Looks like 1.2.2 in ores-wikimedia-config master [09:22:23] Ack. i take that back [09:22:25] It is 1.2.4 [09:23:14] Yup. problem exists in 1.2.4 [09:24:09] okay [09:24:36] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329344 (10Halfak) [09:24:39] https://phabricator.wikimedia.org/T136278 [09:24:50] Looks like an update to revscoring 1.2.6 fixes the issue [09:25:04] halfak_: do you want me to fix it and you go enjoy the conference ? [09:25:32] right now I'm checking spike of hourly errored scores [09:25:35] it can wait [09:25:45] halfak_: I have some good news for you [09:26:07] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329357 (10Halfak) Issue exists in revscoring==1.2.4, but not revscoring==1.2.6 [09:26:30] I'll get the wheel for 1.2.6 pushed. if you can merge and deploy, that would be great [09:26:40] okay [09:28:56] halfak_: 1- tell me once you're done [09:29:24] 2- watchdog for precaching is up and running (we did a deploy with Alex yesterday) [09:29:41] 3- the new uwsgi is in place now (thanks to Alex!) [09:30:45] 4- I'm finishing some scap configs and we are likely to finish it today [09:33:01] https://gerrit.wikimedia.org/r/290902 [09:33:04] Amir1, ^ [09:33:11] \o/ for watchdog [09:33:34] I was monitoring it yesterday, works like a charm [09:33:35] new uwsgi means the service name is now "ores-web"? [09:34:21] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329385 (10Halfak) I just pushed https://gerrit.wikimedia.org/r/290902. This switches our wheels repo to use 1.2.6 [09:35:04] halfak_: yup [09:35:21] going to staging [09:35:46] Amir1, fyi see https://github.com/wiki-ai/ores-wikimedia-config/commit/f6e638cea87829d9fbccb3c0009d21bf0b1dfa49 [09:35:53] Shouldn't be an issue [09:35:55] :) [09:36:08] yeah [09:36:12] I always forget [09:36:58] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329387 (10Halfak) @Ladsgroup merged the wheel and updated the wheels submodule: https://github.com/wiki-ai/ores-wikimedia-config/commit/90e8e05c807de24abe0f4ed05919485087a6a1db I updated... [09:37:09] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329389 (10Halfak) Going to staging now. [09:39:45] Amir1, staging is still not doing greayt [09:39:54] https://ores-staging.wmflabs.org/v2/scores/enwiki/damaging/?revids=216123|32423423|3243242|234324 [09:41:11] halfak_: there is a thing with staging [09:41:14] I need to fix it [09:41:16] OK [09:41:17] not related to this [09:45:10] Still... doesn't look like the staging deploy happened [09:45:34] Amir1, shall I push to staging? [09:46:28] Going to staging [09:48:41] Failed to restart uwsgi-ores.service: Unit uwsgi-ores.service failed to load: No such file or directory. [09:48:47] halfak_: yup [09:48:48] Amir1, ^ [09:48:51] I'm fixing it [09:48:53] Needs a puppet run? [09:48:54] kk [09:49:51] yeah [09:49:54] I'm trying to do it [09:50:06] it seems it gets error due to stupid stuff [09:51:14] Darn. Going to happen in prod? [09:51:26] nope [09:51:29] I could do a quick hotfix to the web nodes [09:51:32] we check prod very carefully [09:51:39] no [09:51:45] I fix it ASAP [09:52:29] halfak_: don't worry. I've got it [09:52:43] even if it's something. I fix it sooner than you think [09:54:36] halfak_: around? [09:54:36] https://ores.wmflabs.org/v1/scores/enwiki/damaging/?revids=216123|32423423|3243242|234324 [09:54:49] it gets results some times, error some times [09:54:55] because the web nodes are being deployed [09:55:00] great [09:55:03] not finished but it's working [09:55:15] the web nodes are the only ones affected by this issue. [09:55:27] So the problem should be fixed before the change makes it to the workers. [09:55:38] (We should still deploy to the workers) [09:55:40] yeah [09:55:46] I did deploy to workers [09:56:00] but It's obvious it's web nodes, errors too fast [10:02:55] halfak_: probably ores-web-04 is corrupted somehow. Let me investigate [10:14:33] halfak_: around? [10:24:11] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329344 (10Ladsgroup) Everything is okay now :) Deployed. [10:24:26] afk for a while [10:24:27] o/ [10:40:46] hey Amir1 [10:40:50] Sorry stepped away [10:40:54] Looking at web-04 now [10:43:18] Looks like 04 is working as expected [10:44:16] 03 and 05 are doing good too. [10:45:19] halfak_: yup. I fixed it by reboot [10:45:24] Great [10:45:26] I took it out of lb [10:45:31] then checked everything [10:45:35] fixed, deployed [10:45:45] then brought it back to the lb [10:47:15] halfak_: I talked with RelEng and I'm working on the scap configs [10:47:31] probably we are good to go, I need to talk to akosiaris [10:47:36] around akosiaris/ [10:47:38] *? [10:48:15] Amir1, what's the process for telling the lb that a web node is back online? [10:48:38] It understands but the process is slow [10:48:53] Gotcha. So we just wait and the lb will recover [10:48:55] it asks for results and when it doesn't answer [10:48:56] ? [10:49:13] it refuses to send requests to that node for 60 seconds [10:49:22] yup I am around [10:49:22] but what I did was different [10:49:25] wassup ? [10:49:40] 06Revision-Scoring-As-A-Service, 10ORES: ORES cannot score multiple revisions - https://phabricator.wikimedia.org/T136278#2329537 (10Halfak) Deployed! And it looks like we are OK [10:49:47] akosiaris: I asked a question in the patch for scap [10:49:53] it would be great to take a look at [10:50:27] halfak_: what I did in this case was different. https://wikitech.wikimedia.org/w/index.php?title=Hiera:Ores&diff=572835&oldid=569365 and a puppet agent in lb [10:50:44] Gotcha. [10:50:47] Manual depooling [10:51:14] ok [10:52:23] halfak_: yeah, it is faster than the automatic one [10:53:58] akosiaris: "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: secret(): invalid secret keyholder/deploy-service.pub at /etc/puppet/modules/scap/manifests/target.pp:83 on node deployment-ores-web.deployment-prep.eqiad.wmflabs" [10:54:18] I get this in deployment-ores [10:54:54] thanks :) [10:56:17] yeah, expected with all the keyholder changes [10:56:46] probably needs some more love today. Last I head it was supposed to be deploy_service [10:56:54] not deploy-service... let's see what's going on over there [10:57:40] akosiaris: I think the key name should be "servicedeploy_rsa" per https://wikitech.wikimedia.org/wiki/Keyholder [10:58:00] they added a parameter to scap::target called "key_name" [10:58:49] I want to modify declaration of scap::target in ores patch but I can't find any explicit declaration. Is it in the yaml file akosiaris? [10:59:01] https://gerrit.wikimedia.org/r/#/c/280403/51/hieradata/common/role/deployment.yaml [10:59:57] it was servicedeploy yesterday. If you look at the comments in https://gerrit.wikimedia.org/r/#/c/289236/ it was actually renamed yesterday [11:00:52] oh [11:01:02] Amir1: not sure I understand the yaml file question [11:01:30] akosiaris: https://gerrit.wikimedia.org/r/#/c/280403/51 [11:01:39] in this patch we don't have any scap::target [11:02:04] so where you defined scap::target [11:02:11] service::uwsgi [11:02:29] okay, I get it now [11:02:57] so with this web nodes in our setups do have scap::target too. Am I right? [11:03:07] (the fabric setup :D) [11:04:26] yes web nodes have scap::target everywhere [11:05:06] okay [11:05:19] thanks akosiaris, now I get it :) [11:05:30] you 're welcome [11:26:42] halfak_: if it helps, it seem these spikes of hourly errors are coming from wp10 model in enwiki [11:31:16] but I can't get logs of them anywhere [11:58:12] 06Revision-Scoring-As-A-Service, 10ORES: [Investigate] ORES spike of errored requests every hour - https://phabricator.wikimedia.org/T134109#2329618 (10Ladsgroup) Thanks to lots of changes and some luck I find out what is causing the problem: It seems someone with user agent "Ruby" is requesting scores of del... [12:00:50] 06Revision-Scoring-As-A-Service, 10ORES: [Investigate] ORES spike of errored requests every hour - https://phabricator.wikimedia.org/T134109#2329624 (10Ladsgroup) here is a sample of what have been requested: "https://ores.wmflabs.org/v1/scores/enwiki/wp10/?revids=715787136|715899393|711597825|711576195|700871... [12:44:35] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should act based on alphabetical order of checks - https://phabricator.wikimedia.org/T136253#2329739 (10Ladsgroup) >>! In T136253#2329211, @mobrovac wrote: >>>! In T136253#2329179, @Ladsgroup wrote: >> Scap3 reorder checks randomly (since it turn the yaml... [13:46:50] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should act based on alphabetical order of checks - https://phabricator.wikimedia.org/T136253#2330045 (10thcipriani) The problem we found is that, rather than executing from top to bottom, Ores promote checks were running in an unknown order. Random orderi... [13:48:08] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should not be random - https://phabricator.wikimedia.org/T136253#2330046 (10thcipriani) [15:33:44] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should not be random - https://phabricator.wikimedia.org/T136253#2330492 (10Ladsgroup) Hey, @thcipriani You suggested my to change name of checks to 01_foo, 02_bar, etc. The first implication is "alphabetical is our goal". Sorry if I misunderstood you. Yu... [15:50:31] 06Revision-Scoring-As-A-Service, 03Scap3: Scap3 checks should not be random - https://phabricator.wikimedia.org/T136253#2330581 (10thcipriani) >>! In T136253#2330492, @Ladsgroup wrote: > Hey, > @thcipriani You suggested me to change name of checks to 01_foo, 02_bar, etc. The first implication is "alphabetical... [16:22:07] 06Revision-Scoring-As-A-Service, 10Wikimania-Hackathon-2016, 10bwds: Generate bad words for all languages more than 100K articles - https://phabricator.wikimedia.org/T134629#2330770 (10Ladsgroup) >>! In T134629#2329026, @Psychoslave wrote: > Ok, I'll see that. Probably not this week though. Just glancing it,... [16:58:01] akosiaris: hey, I just deployed via scap3 in beta and it works like a charm [19:45:23] halfak_: doing it is very easy [19:45:39] I think so too. We can use deletion reasons on enwiki [19:45:44] I bet other wikis have good standards there too [19:45:51] now I think of it, not very much but running some stats [19:46:13] yeah, CSD is pretty much standard everywhere