[00:36:47] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:48] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:03] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:07] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:17] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:18] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [00:38:22] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [00:38:28] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:28] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [00:38:38] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [00:38:42] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [00:38:53] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [00:38:57] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [00:39:02] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [00:39:12] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [00:39:29] Aⅼlаh ⅰѕ ԁⲟing [00:40:17] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [00:40:37] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [00:40:43] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [00:40:43] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [00:40:43] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [00:40:57] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [00:41:03] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [00:41:07] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [00:41:08] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [00:41:08] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [00:41:12] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [01:36:52] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:37:52] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:07] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:12] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:22] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:22] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [01:38:27] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [01:38:32] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:32] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [01:38:42] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [01:38:47] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [01:38:57] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [01:39:02] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [01:39:07] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [01:39:17] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [01:40:22] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [01:40:42] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [01:40:47] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [01:40:48] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [01:40:48] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [01:41:02] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [01:41:07] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [01:41:12] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [01:41:12] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [01:41:13] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [01:41:17] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [02:36:56] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:56] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:11] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:16] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:26] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:26] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [02:38:31] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [02:38:36] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [02:38:36] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:46] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [02:38:51] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [02:39:01] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [02:39:06] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [02:39:11] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [02:39:21] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [02:40:26] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [02:40:46] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [02:40:51] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [02:40:51] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [02:40:51] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [02:41:06] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [02:41:11] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [02:41:16] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [02:41:16] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [02:41:17] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [02:41:21] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [03:06:48] Αllah iѕ dοinɡ [03:06:48] s∪ᥒ is ᥒⲟt dⲟing Aⅼlаh іѕ doing [03:17:57] Aⅼⅼah iѕ doⅰᥒg [03:20:42] Alⅼah is doing [03:37:01] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:01] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:16] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:21] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:31] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:31] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [03:38:36] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [03:38:41] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [03:38:41] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:38:51] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [03:38:56] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [03:39:06] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [03:39:11] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [03:39:16] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [03:39:26] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [03:40:31] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [03:40:51] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [03:40:56] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [03:40:56] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [03:40:56] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [03:41:11] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [03:41:16] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [03:41:21] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [03:41:21] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [03:41:21] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [03:41:26] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [04:37:05] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:05] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:20] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:25] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:35] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:35] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [04:38:40] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [04:38:45] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [04:38:45] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:55] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [04:39:00] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [04:39:10] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [04:39:15] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [04:39:20] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [04:39:30] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [04:40:35] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [04:40:55] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [04:41:00] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [04:41:00] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [04:41:00] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [04:41:15] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [04:41:20] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [04:41:25] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [04:41:25] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [04:41:25] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [04:41:30] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [05:32:49] Αllaһ is dοiᥒg [05:32:49] ѕun is not ԁοing Ꭺⅼⅼah ⅰѕ ⅾoinɡ [05:32:49] ⅿoon ⅰs ᥒഠt ⅾoing Allɑh іs doing [05:33:36] Ꭺlⅼɑh іs doⅰng [05:37:10] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:09] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:25] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:30] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:39] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:40] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [05:38:44] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [05:38:49] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [05:38:50] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:59] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [05:39:05] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [05:39:14] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [05:39:19] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [05:39:25] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [05:39:34] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [05:40:40] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [05:40:59] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [05:41:04] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [05:41:05] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [05:41:05] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [05:41:19] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [05:41:24] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [05:41:29] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [05:41:30] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [05:41:30] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [05:41:34] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [05:45:00] (03CR) 10Awight: Introduce ext.ores.api (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/459549 (https://phabricator.wikimedia.org/T201691) (owner: 10Ladsgroup) [06:37:14] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:14] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:29] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:34] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:44] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [06:38:44] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:49] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [06:38:54] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [06:38:54] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:39:04] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [06:39:09] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [06:39:19] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [06:39:24] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [06:39:29] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [06:39:39] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [06:40:44] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [06:41:04] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [06:41:09] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [06:41:09] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [06:41:09] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [06:41:24] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [06:41:29] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [06:41:34] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [06:41:34] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [06:41:34] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [06:41:39] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [07:37:18] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:18] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:33] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:38] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:48] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [07:38:48] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:53] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [07:38:58] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:38:58] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [07:39:08] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [07:39:13] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [07:39:23] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [07:39:28] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [07:39:33] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [07:39:43] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [07:40:48] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [07:41:08] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [07:41:13] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [07:41:13] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [07:41:14] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [07:41:28] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [07:41:33] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [07:41:38] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [07:41:38] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [07:41:38] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [07:41:43] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [08:37:22] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:22] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:37] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:42] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:52] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [08:38:52] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:38:58] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [08:39:03] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [08:39:03] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:39:13] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [08:39:18] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [08:39:27] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [08:39:32] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [08:39:37] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [08:39:48] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [08:40:52] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [08:41:12] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [08:41:17] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [08:41:17] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [08:41:17] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [08:41:32] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [08:41:37] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [08:41:42] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [08:41:42] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [08:41:42] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [08:41:47] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [08:45:20] Aⅼⅼah іs ԁoing [08:45:20] ѕun іs not dοiᥒg Aⅼⅼaһ iѕ doiᥒg [08:52:04] Alⅼah is ԁoіng [09:34:17] 10Scoring-platform-team, 10ORES, 10Operations, 10Traffic: Pass on name of the node serving ORES requests as response header to the user - https://phabricator.wikimedia.org/T204600 (10ema) p:05Triage>03Normal [09:37:26] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:26] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:42] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:47] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:56] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:57] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [09:39:01] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [09:39:07] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:39:07] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [09:39:17] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [09:39:21] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [09:39:31] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [09:39:36] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [09:39:41] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [09:39:51] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [09:40:56] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [09:41:00] hmm [09:41:16] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [09:41:22] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [09:41:22] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [09:41:22] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [09:41:23] who maintains this bot ? it does seem like it's not having a good time [09:41:36] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [09:41:39] Amir1: how is poolcounter in production holding up ? [09:41:41] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [09:41:45] Amir1: o/ [09:41:47] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [09:41:47] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [09:41:47] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [09:41:51] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [09:41:54] * akosiaris forgot to be polite :( [09:41:54] akosiaris: hey there [09:42:09] it's fine. The only thing is that eqiad is not getting any traffic [09:42:20] that worries me a little [09:42:47] yeah that's expected for the next 3 weeks [09:43:11] I am worried too, not just for ORES, but the entire infrastructure but up to now we 've been doing ok [09:43:27] I had anyway estimations about the possible max load we might see with eqiad fully depooled [09:44:08] ores was in the mild greens if that helps [09:45:31] rought aggregate cpu usage was calculated on the 12.5% [09:45:47] * akosiaris wonders if that guesstimation was correct, looking [09:46:47] heh, it's actually below that. around 10% [09:47:21] but close enough so /me happy [09:47:57] akosiaris: also, there is no impact on response time, the median of lock time is 0.5ms and 75 percentile is 0.6ms [09:48:11] nice [09:48:19] got graphs already ? [09:48:23] yup [09:48:47] :) [09:48:53] https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&orgId=1 [09:49:01] akosiaris: The last row [09:49:59] maxes for the 99% up to only 40ms .. that's impressive [09:50:02] good job! [09:50:33] one other thing: I think we used the wrong locking command, right now I was able to make lots of requests at the same time and all got answered. This didn't happen on beta or localhost [09:50:54] I used AQ4ME but probably we need to change it to AQ4ANY [09:52:23] the diff IIRC is the numbers of waiting processes that will be woken up [09:52:30] when a lock is released [09:53:22] at a first look, ACQ4ME does sound ok for ORES. Keeping the populations of workers on a specific key stable [09:53:36] what is the key btw ? The IP of the requesting host I guess ? [09:53:42] yup [09:54:24] so why switch to ACQ4ANY ? wouldn't that mean that all process waiting on the lock will be woken up and allowed to continue working ? [09:54:50] that is the "offender" will spike cpu usage for a bit ? [09:58:07] on an unrelated note, I should probably see if creating a prometheus exporter for poolcounter makes senes [09:58:10] sense* [10:37:31] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:31] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:46] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:51] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:01] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:01] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [10:39:06] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [10:39:11] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [10:39:11] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:21] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [10:39:26] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [10:39:36] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [10:39:41] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [10:39:46] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [10:39:56] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [10:41:01] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [10:41:21] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [10:41:26] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [10:41:26] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [10:41:26] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [10:41:41] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [10:41:46] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [10:41:51] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [10:41:51] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [10:41:51] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [10:41:56] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [11:20:42] Αlⅼаh is ⅾoⅰᥒɡ [11:21:48] akosiaris: okay, that makes sense but the only thing is that I was able to run more than ten connections at the same time and none of it got timeout error [11:22:02] while I was able to get timeout error in localhost or beta cluster [11:22:07] something is off here [11:26:00] 10Scoring-platform-team, 10ORES, 10Operations, 10Traffic, 10Patch-For-Review: Pass on name of the node serving ORES requests as response header to the user - https://phabricator.wikimedia.org/T204600 (10ema) 05Open>03Resolved a:03ema Done: ``` $ curl -v https://ores.wikimedia.org/v3/scores/wikidat... [11:35:32] 10Scoring-platform-team, 10ORES, 10Operations, 10Traffic, 10Patch-For-Review: Pass on name of the node serving ORES requests as response header to the user - https://phabricator.wikimedia.org/T204600 (10Ladsgroup) Thank you! [11:37:35] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:35] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:50] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:55] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:38:58] Amir1: I know how to fix ^^ [11:39:05] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:39:05] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [11:39:10] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [11:39:11] Requires a ip change in the hiera [11:39:15] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:39:15] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [11:39:25] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [11:39:26] paladox: I can take care of the hiera thing [11:39:30] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [11:39:35] if I have access [11:39:40] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [11:39:45] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [11:39:50] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [11:39:56] Amir1: https://wikitech.wikimedia.org/wiki/Hiera:Ores [11:40:00] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [11:40:02] * paladox gets the ip [11:41:05] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [11:41:25] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [11:41:30] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [11:41:30] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [11:41:30] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [11:41:33] Amir1: change “profile::base::nrpe_allowed_hosts": 127.0.0.1,10.68.23.211” to profile::base::nrpe_allowed_hosts": 127.0.0.1,172.16.1.180 [11:41:45] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [11:41:50] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [11:41:55] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [11:41:56] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [11:41:56] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [11:42:00] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [11:42:09] done [11:42:14] let's wait and see [11:42:16] Amir1: misplet I mean “12:41 paladox: Amir1: change “profile::base::nrpe_allowed_hosts: 127.0.0.1,172.16.1.180” [11:42:33] Ah ok thanks :) [11:45:11] Amir1: it may work or may not because this host is in neutron now so I’m the old network it may see it as a floating ip [11:45:20] * paladox hopes it will work with this ip :) [11:46:03] I see [12:37:39] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:38:39] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:38:54] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:38:59] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:09] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:09] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [12:39:14] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [12:39:19] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:20] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [12:39:29] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [12:39:34] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [12:39:44] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [12:39:49] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [12:39:54] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [12:40:04] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [12:41:09] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [12:41:29] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [12:41:34] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [12:41:35] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [12:41:35] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [12:41:49] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [12:41:54] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [12:41:59] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [12:41:59] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [12:41:59] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [12:42:04] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [13:25:38] 10Scoring-platform-team, 10Growth-Team, 10ORES, 10StructuredDiscussions: Implement "orientation" for Flow's Structured Discussion revisions - https://phabricator.wikimedia.org/T177245 (10kostajh) This needs to happen if/when we get around to working on {T177245}. Until then, moving this to the future column. [13:37:43] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:43] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:58] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:03] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:13] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:13] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [13:39:18] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [13:39:23] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [13:39:23] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:33] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [13:39:38] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [13:39:48] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [13:39:53] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [13:39:58] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [13:40:08] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [13:41:13] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [13:41:33] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [13:41:38] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [13:41:38] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [13:41:38] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [13:41:53] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [13:41:58] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [13:42:03] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [13:42:03] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [13:42:03] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [13:42:08] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [14:37:46] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:46] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:02] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:06] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:16] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:16] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [14:39:21] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [14:39:26] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [14:39:27] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:36] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [14:39:41] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [14:39:51] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [14:39:56] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [14:40:01] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [14:40:11] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [14:41:06] Ꭺlⅼaһ ⅰѕ ԁഠinɡ [14:41:06] sᥙn is not dοing Allah iѕ doіnɡ [14:41:06] ⅿооn iѕ not doinɡ Аllаһ іs ԁoiᥒg [14:41:16] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [14:41:37] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [14:41:42] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [14:41:42] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [14:41:42] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [14:41:56] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [14:42:01] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [14:42:06] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [14:42:06] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [14:42:06] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [14:42:11] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [14:43:20] 10Scoring-platform-team, 10Education-Program-Dashboard: Extend ORES graph support to all langauges that have WP10 ORES model - https://phabricator.wikimedia.org/T204724 (10Theklan) [15:20:28] Aⅼⅼah іѕ doіnɡ [15:37:47] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:47] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:02] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:07] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:17] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [15:39:18] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:22] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [15:39:24] Amir1: has u deployed the poolcounter code? Just asking for reporting purposes. [15:39:27] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:27] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [15:39:34] awight: it's deployed [15:39:38] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [15:39:42] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [15:39:52] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [15:39:57] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [15:40:02] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [15:40:12] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [15:40:21] Amir1: Strong work! [15:40:39] Want to kick the related tasks over? [15:40:58] Thanks, I still need to make sure it works properly and does what it supposed to do, maybe some polish will be needed [15:41:09] but most stuff is done [15:41:17] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [15:41:22] awight: https://phabricator.wikimedia.org/T160692 [15:41:37] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [15:41:43] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [15:41:43] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [15:41:43] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [15:41:57] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [15:42:02] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [15:42:07] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [15:42:07] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [15:42:07] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [15:42:12] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [15:43:01] perhaps I should kick the icinga config [15:43:20] ...hate that we compromised and don't show the full hostname [16:32:22] awight: is this better? https://www.mediawiki.org/w/index.php?title=JADE%2FUse_cases&type=revision&diff=2885597&oldid=2884946 [16:32:50] harej: stuck in a meeting until 10 but looking forward to it! [16:40:00] RECOVERY - check http on Experimental ORES Website is OK: OK - Certificate '*.wmflabs.org' will expire on Fri 16 Nov 2018 03:41:05 PM GMT +0000. [17:07:50] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:08:50] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:05] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:10] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:20] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [17:09:25] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [17:09:30] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [17:09:30] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:40] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [17:09:45] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [17:09:55] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [17:10:00] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [17:10:05] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [17:10:15] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [17:11:20] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [17:11:40] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [17:11:45] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [17:11:45] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [17:11:45] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [17:12:00] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [17:12:05] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [17:12:10] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [17:12:10] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [17:12:10] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [17:12:15] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [17:19:03] ookay [17:20:50] harej: Looking now [17:21:26] harej: lgtm! [17:26:28] harej: https://etherpad.wikimedia.org/p/JADE_schema_challenges [17:35:05] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10User-Joe: Write our anticipated "phase two" schemas and submit for review - https://phabricator.wikimedia.org/T202596 (10awight) See also {T204250}. [17:37:55] awight: I'm not sure why the first thing is an issue (I might just be dense) [17:38:41] We know what entity they're judging [17:39:09] harej: This was the thing I was trying to bring up last Friday, where changing judgment.notes is equivalent to changing the !vote proposition. [17:40:06] So if the notes say "Damaging because the edit includes a spam link", then gets endorsed and later someone changes notes to "damaging because I hate the author's hair color", then the endorser is in an awkward position. [17:40:36] In !votes, you just don't change the propositions, but we're actively driving editors to change the judgment.notes [17:41:14] I may be overthinking this, halfak thinks so at least. [17:41:37] What does that have to do with knowing what revision of the judgment was viewed by the user? [17:44:37] For each of these challenges, it would be good to have an associated user story. I'll work on that or prod you if I'm not seeing the point [17:45:22] yes! I'd love to write some user stories for these. [17:45:49] The judgment revision tells us what judgment.notes was at the time of endorsement, sorry I'm missing the question. [17:48:31] Ohhhh [17:48:46] Couldn't it be inferred? [17:49:15] i am reading rev 555, which is a judgment page. i edit it, creating rev 556. wouldn't it be implied that the user was looking at the revision before the current one? [17:49:34] It can be approximately inferred, the only danger would be race conditions. [17:49:45] Inference would be slightly expensive also, but nothing prohibitive AIUI. [17:50:11] Ah--no, inference is actually very expensive [17:50:27] We would have to reconstruct when the endorsement is made by parsing page history. [17:50:54] As a bare data structure, the judgment content wouldn't include any information needed to reconstruct without the "proposition_rev" field. [18:17:02] I think I'm done listing challenges for now. [18:18:04] (03Abandoned) 10Awight: [WIP] Hook to update judgment link table [extensions/JADE] - 10https://gerrit.wikimedia.org/r/460616 (owner: 10Awight) [18:37:54] PROBLEM - ssh on ORES-redis02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:38:54] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:09] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:14] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:23] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [18:39:29] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [18:39:34] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [18:39:34] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:44] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [18:39:49] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [18:39:59] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [18:40:04] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [18:40:09] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [18:40:19] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [18:41:20] harej: one more issue appended at line 42 [18:41:24] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [18:41:44] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [18:41:49] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [18:41:49] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [18:41:49] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [18:42:04] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [18:42:09] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [18:42:14] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [18:42:14] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [18:42:14] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [18:42:18] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [18:43:07] DOWNTIMESTART - Host ORES-worker01.experimental is UP: PING OK - Packet loss = 0%, RTA = 1.65 ms awight neutron network, firewall? [18:43:17] DOWNTIMESTART - Host ORES-worker02.experimental is UP: PING OK - Packet loss = 0%, RTA = 0.99 ms awight neutron network, firewall? [18:43:27] DOWNTIMESTART - Host ORES-web02.Experimental is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms awight neutron network, firewall? [18:43:47] DOWNTIMESTART - Host ORES-web01.Experimental is UP: PING OK - Packet loss = 0%, RTA = 1.54 ms awight neutron network, firewall? [19:14:37] 10Scoring-platform-team, 10ORES, 10Regression, 10Wikimedia-production-error: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10Krinkle) p:05Triage>03High [19:24:59] 10Scoring-platform-team (Current), 10ORES, 10Regression, 10Wikimedia-production-error: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10awight) [19:25:00] Amir1: https://phabricator.wikimedia.org/T204753 [19:25:16] I can help debug, if you're busy [19:27:57] awight: feel free, I'm done for the day (It's wikidata day anyway :D) [19:28:09] will do! [19:31:02] 10Scoring-platform-team (Current), 10ORES, 10Regression, 10Wikimedia-production-error: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10awight) Some of the errors point to a service failure, ``` Exception executing job: ORESFetchScoreJob Discipline_(academia) models=["... [19:35:55] 10Scoring-platform-team (Current), 10ORES, 10WMF-JobQueue, 10Regression, and 2 others: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10Pchelolo) [19:45:36] 10Scoring-platform-team (Current), 10ORES, 10WMF-JobQueue, 10Regression, and 2 others: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10awight) This is probably related to the datacenter switchover. Score processing load on our CODFW cluster doubled on Sept 11, and we... [20:08:58] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:13] PROBLEM - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:18] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:28] PROBLEM - check users on ORES-worker02.experimental is UNKNOWN: [20:09:33] PROBLEM - check disk on ORES-redis02.experimental is UNKNOWN: [20:09:38] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:38] PROBLEM - check load on ORES-web01.Experimental is UNKNOWN: [20:09:48] PROBLEM - puppet on ORES-redis02.experimental is UNKNOWN: [20:09:53] PROBLEM - check disk on ORES-worker01.experimental is UNKNOWN: [20:10:03] PROBLEM - check disk on ORES-web02.Experimental is UNKNOWN: [20:10:08] PROBLEM - check load on ORES-worker01.experimental is UNKNOWN: [20:10:13] PROBLEM - check load on ORES-redis02.experimental is UNKNOWN: [20:10:23] PROBLEM - check load on ORES-worker02.experimental is UNKNOWN: [20:11:26] harej: I'm interested what u think about the last point, and the "Judgment:Page/" suggestion. [20:11:28] PROBLEM - check users on ORES-redis02.experimental is UNKNOWN: [20:11:48] PROBLEM - check users on ORES-worker01.experimental is UNKNOWN: [20:11:53] PROBLEM - check users on ORES-web02.Experimental is UNKNOWN: [20:11:53] PROBLEM - check users on ORES-web01.Experimental is UNKNOWN: [20:11:53] PROBLEM - check load on ORES-web02.Experimental is UNKNOWN: [20:12:08] PROBLEM - check disk on ORES-worker02.experimental is UNKNOWN: [20:12:13] PROBLEM - puppet on ORES-web01.Experimental is UNKNOWN: [20:12:18] PROBLEM - puppet on ORES-worker02.experimental is UNKNOWN: [20:12:18] PROBLEM - puppet on ORES-web02.Experimental is UNKNOWN: [20:12:18] PROBLEM - puppet on ORES-worker01.experimental is UNKNOWN: [20:12:23] PROBLEM - check disk on ORES-web01.Experimental is UNKNOWN: [20:12:45] hmm... expected? ^^ [20:18:05] Hauskatze: known bug, at least :). It's something related to network changes... [20:18:45] onces ores is in neutron it may fix it's self. [20:19:02] but some how reaching port 22 and 5666 is failing [20:21:02] 10Scoring-platform-team (Current), 10ORES, 10WMF-JobQueue, 10Regression, and 2 others: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10awight) Here's something odd grabbed from the ORES service at the same second as one of the job failure messages: ``` ores1001: [pid:... [20:23:48] ACKNOWLEDGEMENT - ssh on ORES-worker02.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds paladox ack [20:23:56] ACKNOWLEDGEMENT - ssh on ORES-worker01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds paladox ack [20:23:59] 10Scoring-platform-team (Current), 10ORES, 10WMF-JobQueue, 10Regression, and 2 others: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10Pchelolo) > According to the ORES code, this should only happen if the wiki context name is unexpected or if the model names are miss... [20:24:03] ACKNOWLEDGEMENT - ssh on ORES-web01.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds paladox ack [20:24:33] ACKNOWLEDGEMENT - puppet on ORES-web02.Experimental is UNKNOWN: paladox ack [20:24:39] ACKNOWLEDGEMENT - check disk on ORES-web01.Experimental is UNKNOWN: paladox ack [20:24:57] ACKNOWLEDGEMENT - check load on ORES-web02.Experimental is UNKNOWN: paladox ack [20:24:57] ACKNOWLEDGEMENT - check disk on ORES-worker02.experimental is UNKNOWN: paladox ack [20:24:58] ACKNOWLEDGEMENT - check users on ORES-web01.Experimental is UNKNOWN: paladox ack [20:24:58] ACKNOWLEDGEMENT - check users on ORES-web02.Experimental is UNKNOWN: paladox ack [20:24:59] ACKNOWLEDGEMENT - check users on ORES-worker01.experimental is UNKNOWN: paladox ack [20:25:51] ACKNOWLEDGEMENT - check disk on ORES-web02.Experimental is UNKNOWN: paladox ack [20:25:51] ACKNOWLEDGEMENT - check disk on ORES-redis02.experimental is UNKNOWN: paladox ack [20:25:51] ACKNOWLEDGEMENT - check load on ORES-web01.Experimental is UNKNOWN: paladox ack [20:25:52] ACKNOWLEDGEMENT - check load on ORES-redis02.experimental is UNKNOWN: paladox ack [20:25:54] ACKNOWLEDGEMENT - check users on ORES-redis02.experimental is UNKNOWN: paladox ack [20:25:55] ACKNOWLEDGEMENT - puppet on ORES-worker02.experimental is UNKNOWN: paladox ack [20:25:55] ACKNOWLEDGEMENT - puppet on ORES-redis02.experimental is UNKNOWN: paladox ack [20:25:56] ACKNOWLEDGEMENT - puppet on ORES-web01.Experimental is UNKNOWN: paladox ack [20:25:57] ACKNOWLEDGEMENT - puppet on ORES-worker01.experimental is UNKNOWN: paladox ack [20:26:04] ACKNOWLEDGEMENT - check disk on ORES-worker01.experimental is UNKNOWN: paladox ac [20:26:04] ACKNOWLEDGEMENT - check load on ORES-worker02.experimental is UNKNOWN: paladox ac [20:26:04] ACKNOWLEDGEMENT - check load on ORES-worker01.experimental is UNKNOWN: paladox ac [20:26:05] ACKNOWLEDGEMENT - check users on ORES-worker02.experimental is UNKNOWN: paladox ac [20:26:15] man that is spammy lol [20:28:03] ACKNOWLEDGEMENT - ssh on ORES-web02.Experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds paladox ack [20:28:28] now that wont send a warning / error until the service changes [20:43:07] awight: i was running errands, i will take a look at the etherpad [21:05:30] awight: I left some comments. We would have to do automated page moves. [22:12:44] harej: yeah we would have to hook into TitleMoveComplete, I think. [22:17:48] 10Scoring-platform-team (Current), 10ORES, 10WMF-JobQueue, 10Regression, and 2 others: Failed executing job: ORESFetchScoreJob - https://phabricator.wikimedia.org/T204753 (10awight) >>! In T204753#4595550, @Pchelolo wrote: >> According to the ORES code, this should only happen if the wiki context name is u... [22:23:37] lol the tech dept document about foundationwiki repeatedly says "wikimedia.com" [22:25:40] symbolically funny [22:27:21] I feel like it reinforces your preconceived notions :) [22:28:57] Fraudian slip [22:29:40] harej: Good catch wrt. redirects, that's not going to be fun either way. [22:29:48] Can you explain double redirects? [22:30:03] Do we usually follow the trail backwards and point old redirects to the latest page name? [22:30:37] Nope. [22:30:46] Redirects are only followed to one level [22:30:50] Rdr --> page [22:31:09] in case of Rdr1 --> Rdr2 --> page, Rdr1 redirects to Rdr2, and then you have to manually follow through to page. [22:31:57] reading now, https://en.wikipedia.org/wiki/Wikipedia:Double_redirects [22:32:42] So if a page is renamed Page1, to Page2, to Page3, then we have to manually fix Page1 to redirect to Page3? [22:34:17] Looks like bots can handle this, if we're okay with a few days of lag time [22:34:20] Yes, though most big wikis have a bot that do it [22:34:25] (and if we have a wikitext main slot) [22:34:37] good point--the bots won't be available on every wiki [22:36:18] I'm not thrilled about emulating redirects in JSON, that seems like a losing battle. [22:36:42] So far, it's looking like either Page/ or the wikitext main slot are the sane alternatives. [22:37:51] harej: Thanks for taking on this "schema challenge", it's great to have someone to dialogue with! [22:42:04] on your schema changes document, I wrote user stories for the things I had user stories for. Other things I am not sure are really product-facing problems [22:44:53] That's perfect. I think line 40 should have a story, though. That's the justification for our talk pages, so we already have user-facing intentions. [22:45:20] line 9 also seems to be user-facing [22:55:35] harej: What do you think about line 41? I don't know enough about wiki processes... [22:55:54] ah cool (flow analogue) [22:57:23] that would be a good candidate for the wikitext slot [22:57:31] so we ship without it at first, then add it later? [22:58:28] (03CR) 10Catrope: Introduce ext.ores.api (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/459549 (https://phabricator.wikimedia.org/T201691) (owner: 10Ladsgroup) [22:58:35] i'm open to a few alternatives here, that's certainly one of them. [22:58:56] The migration path doesn't look very pleasant, but probably not bad enough to block us. [22:59:23] The problem there is that any tools integrating with the JSON content will hit that breaking change like a brick wall. [22:59:51] We could also implement as a top-level notes field, if halfak decides that makes sense. [22:59:52] What if the main slot is JSON and wikitext is the adjunct slot? [23:00:07] that's possible, but seems like an abuse of MCR :) [23:00:17] I like the creative approach! [23:00:33] We'd be justified in that the wikitext bit is an iteration on our original design, not something built in from the start [23:00:56] It wouldn't support redirects & stuff... [23:01:09] also conceptually suspecvt [23:01:28] Not if you view the JSON as the primary good and the wikitext as supplementary content [23:01:53] that's true [23:02:08] Which it would be in this case [23:02:26] I'm obviously still attached to the idea that the wikitext is the primary content for humans, and JSON the secondary, machine content... [23:05:08] Internet therapy: https://imgur.com/gallery/PerHKxa [23:12:28] awight: I would break that out into a separate issue [23:12:46] which piece? [23:13:35] also FYI, I'm back to thinking I should drop page judgments from the first iteration. This title vs ID business is going to cause too much conflict with the broader tech community, it seems. [23:14:47] Also, what models are exactly applied to it? [23:14:51] dkinzler wants to see the judgment in a MCR slot on the page itself, Krinkle and others want to see page title, I want to see wikitext, and [23:15:15] yeah that's the last and probably most important problem, only "drafttopic" which doesn't have a very nice judgment model yet. [23:15:28] and drafttopic should be delayed anyway [23:15:31] not on first release [23:15:33] I had to leave the topics as a freeform set of freeform strings [23:15:35] +1 [23:16:08] done. I was trying to keep only because it helped me not make assumptions about rev_id, but it's just not worth it. [23:17:27] diff vs revision judgments will keep me somewhat honest [23:17:39] so, are we going to bring this document up with halfak when he returns? [23:18:37] /o\ [23:18:46] I really don't know how to proceed with that. [23:19:18] I'm plugging away at the RFC-related changes first. [23:19:42] IMO we should copy user stories over to the Use cases doc. [23:19:48] especially the challenging ones. [23:21:01] That seems like a much better-grounded place to make adjustments, rather than the schema itself which only makes sense in relation to the use cases. [23:41:52] I was hoping to hear your opinion on that topic... [23:42:51] Which topic? [23:43:04] Whether to copy user stories over to the Use cases doc? [23:47:23] yah and whether to burn the evidence ;-) [23:50:28] (03PS2) 10Awight: Rename namespace to NS_JUDGMENT [extensions/JADE] - 10https://gerrit.wikimedia.org/r/460995 [23:50:30] (03PS5) 10Awight: Change schema to a list of heterogenous judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/456424 [23:50:32] (03PS7) 10Awight: [WIP] Secondary indexes for JADE pages [extensions/JADE] - 10https://gerrit.wikimedia.org/r/456078 (https://phabricator.wikimedia.org/T203037) [23:50:34] (03PS1) 10Awight: update schema to include endorsements; drop page judgments for now [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461255 [23:50:46] Let's keep it on the separate etherpad for now but link to it somewhere so we don't forget it. I need to figure out how I want to organize our documentation in general, now that we're producing more of it. [23:51:01] sounds good [23:51:45] Are there other JADE-related etherpads we haven't persisted to wiki? [23:52:34] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Secondary indexes for JADE pages [extensions/JADE] - 10https://gerrit.wikimedia.org/r/456078 (https://phabricator.wikimedia.org/T203037) (owner: 10Awight) [23:54:46] oof are there ever [23:54:50] (03CR) 10jerkins-bot: [V: 04-1] Change schema to a list of heterogenous judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/456424 (owner: 10Awight) [23:56:20] (03CR) 10jerkins-bot: [V: 04-1] update schema to include endorsements; drop page judgments for now [extensions/JADE] - 10https://gerrit.wikimedia.org/r/461255 (owner: 10Awight) [23:58:58] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Secondary indexes for JADE pages [extensions/JADE] - 10https://gerrit.wikimedia.org/r/456078 (https://phabricator.wikimedia.org/T203037) (owner: 10Awight)