[13:07:44] OK looks like icinga noise is still on, but it's no longer constant. [13:08:39] halfak: i havent even been awake long enough to break anything... so what gives [13:08:47] :P [13:09:00] I wonder where I can read icinga logs. [13:11:13] halfak: do you guys no longer use icinga2? [13:11:29] we ask pala.dox to move it to #wikimedia-ores [13:11:37] Ah [13:11:48] I wondered why the alerts werent being sent here [14:33:00] 10ORES, 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: Use documentation builds as part of CI testing - https://phabricator.wikimedia.org/T231212 (10Halfak) I wonder if we should be using `-nW` ` -n nit-picky mode, warn about all missing references [...]... [15:52:27] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/ORES [15:54:05] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 1.884 second response time https://wikitech.wikimedia.org/wiki/ORES [15:58:30] 10ORES, 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: Use documentation builds as part of CI testing - https://phabricator.wikimedia.org/T231212 (10Halfak) I just updated the `wikimedia/revscoring` PR with `-anW` and did the work to get it passing. [16:48:42] 10Scoring-platform-team (Current): Build tool to guess what tool was used to make reverts on Wikimedia wikis - https://phabricator.wikimedia.org/T226426 (10Halfak) @Groceryheist, is this bit done for now? [17:52:04] wikimedia/editquality#667 (docs-test-ci - 2e6c8bb : halfak): The build was broken. https://travis-ci.org/wikimedia/editquality/builds/582802029 [18:05:45] Oh man. This sphinx stuff is gonna make me light my hair on fire. [18:06:00] We're trying to do something clever and sphinx can't figure it out. [18:06:31] uh oh [18:08:27] are you trying to "mock" the docstring for sklearn still? [18:23:20] Na. I have revscoring done. I think editquality is working too. [18:23:59] I was stuck on how we define extractors in articlequality. I overwrote the module itself to make imports a bit more memory efficient. I'm squashing that now. Probably doesn't matter and too clever for its own good. [18:24:36] Now I'm stuck trying to get a specific automodule to work. [18:24:49] Can't figure out why it does nothing and throws no warning. [18:28:05] i ran into that a while ago and recall it had something to do with the the module not being included in PATH [18:29:21] this might be incorrect: https://github.com/wikimedia/editquality/blob/master/docs/conf.py#L25 [18:30:36] whoops wrong repo [18:31:34] 10Scoring-platform-team (Current): Build tool to guess what tool was used to make reverts on Wikimedia wikis - https://phabricator.wikimedia.org/T226426 (10Groceryheist) Yeah I would say so. There's always room to improve it i.e. to support more tools and wikis. I also haven't done quality checks for wikis that... [18:32:07] https://github.com/wikimedia/articlequality/blob/master/doc/conf.py#L18 [18:33:39] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [18:34:03] accraze, seems like that achieves the same goal to me. [18:35:05] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 0.783 second response time https://wikitech.wikimedia.org/wiki/ORES [18:35:13] ahh yeah nvm [18:43:51] OK I finished revscoring, editquality, and articlequality. I'm heading to lunch [18:48:55] awesome! [18:59:00] session-orientation PR is reviewed, looks great! just a couple of failing tests due to refactor [19:46:53] working on the docs PRs again. [19:47:05] I see the articlequality one is failing. [19:48:39] Aha I see you fixed the revscoring one accraze [19:49:50] haha yeah just saw an easy fix so I went for it [19:50:09] good call on standardizing to "docs/" [19:50:32] Yeah. Figured it couldn't hurt anything to switch it at this point. [19:51:09] working through the others now. [20:14:40] 10ORES, 10Scoring-platform-team (Current): Feature injection doesn't work when using "?revids=" param - https://phabricator.wikimedia.org/T232143 (10ACraze) merged! [20:35:53] (03CR) 10MaxSem: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/510166 (owner: 10Kosta Harlan) [20:39:38] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [20:39:45] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [20:40:16] (03CR) 10jerkins-bot: [V: 04-1] Inject Config to ORESService, convert test to unit test [extensions/ORES] - 10https://gerrit.wikimedia.org/r/510166 (owner: 10Kosta Harlan) [20:40:45] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [20:47:01] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.956 second response time https://wikitech.wikimedia.org/wiki/ORES [20:51:57] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [20:52:17] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 0.167 second response time https://wikitech.wikimedia.org/wiki/ORES [20:52:23] curses [20:52:41] I'll be looking into this more later today. [20:54:13] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 9.687 second response time https://wikitech.wikimedia.org/wiki/ORES [20:54:57] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.068 second response time https://wikitech.wikimedia.org/wiki/ORES [20:58:45] accraze, OK! All of the docs PRs should be in a good state. [20:58:58] I trimmed some garbage I found along the way :) [21:19:28] ooooh. I don't see this problem on production but I do see it the beta cluster. [21:19:33] WHAT IS GOING ON? [21:23:03] Yup. Confirmed. It's in the beta cluster but not production. [21:23:18] But I can't replicate on staging -- which should be just like the beta node. [21:27:41] On beta, I see no OOM, no queue full, and no statsd warnings. [21:29:03] Hmm.. I'm struggling to replicate it again., [21:32:16] 10ORES, 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: Use documentation builds as part of CI testing - https://phabricator.wikimedia.org/T231212 (10Halfak) OK all the repos should be good to go now. @ACraze please review my work and feel free to merge. [21:32:59] accraze, would you agree that it's possible that most of the icinga noise started on Sept 1st? [21:33:13] Looks like that is when our request rate went way up. [21:33:27] It looks like it doubled. [21:34:01] yeah I would say it started getting REALLY noisy around them [21:34:23] Hmm. Well, let me bump the capacity a bit. [21:34:29] I'll add another worker to the pool. [21:34:45] I don't expect this to solve the problem, but we should be scaled to another worker anyway. [21:50:19] * halfak does the puppet role dance [22:03:56] (03PS1) 10MaxSem: Unbreak tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/535299 [22:09:13] OK well, I just found out that cloud is doing some maintenance and I had to give up on creating another worker right now. [22:13:27] halfak: just found out too [22:13:36] not long before you [22:14:30] Aha. Oh well. GOod stopping point for me. [22:14:35] Let me update the ticket and hit the road. [22:17:12] 10ORES, 10Scoring-platform-team: Investigate intermittent delay for basic uwsgi requests. - https://phabricator.wikimedia.org/T232228 (10Halfak) I confirmed that I do not see the problem: * in Production (tested ores1001.eqiad.wmnet) * in the Beta cluster (tested deployment-ores01.eqiad.wmflabs) * in the Stagi... [22:18:18] 10ORES, 10Scoring-platform-team: Investigate intermittent delay for basic uwsgi requests. - https://phabricator.wikimedia.org/T232228 (10Halfak) Recently, we've been getting hit with a lot more requests which could be related. See https://grafana-labs-admin.wikimedia.org/d/000000006/ores-labs?orgId=1&panelId=... [22:18:24] OK I'm out. [22:18:31] Have a good one folks! [22:19:10] later halAFK [23:12:21] (03CR) 10Mooeypoo: [C: 03+2] Unbreak tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/535299 (owner: 10MaxSem) [23:26:21] (03Merged) 10jenkins-bot: Unbreak tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/535299 (owner: 10MaxSem) [23:39:17] (03CR) 10jenkins-bot: Unbreak tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/535299 (owner: 10MaxSem) [23:56:39] (03CR) 10jenkins-bot: Unbreak tests [extensions/ORES] - 10https://gerrit.wikimedia.org/r/535299 (owner: 10MaxSem)