[00:12:43] hm, ok [00:13:18] the pod still shows up in the output… no idea how long it’ll stay there [00:13:25] (it’s not a problem, just unfamiliar, I guess ^^) [00:22:32] forever, unless you have something set to clean it up [00:22:39] toolforge jobs should automagically [00:26:38] it’s a webservice pod [00:27:15] and apparently it’s specific to this pod – another one went from Terminating to completely vanishing when I did another restart https://paste.toolforge.org/view/8fbcb8e8 [00:44:18] yeah those should be automatically cleaned up [00:44:40] speaking of, https://k8s-status.toolforge.org/namespaces/tool-signatures/ appears to have ended up with a number of ReplicaSets [00:45:09] and https://k8s-status.toolforge.org/namespaces/tool-anticompositetools/ has a few as well... [00:45:18] AFAIK it usually keeps up to ten replicasets [00:45:43] huh, https://k8s-status.toolforge.org/namespaces/tool-wd-image-positions/ ended up with 11 [00:46:00] but it’s been that way for years [03:28:59] hello, everyone! It might be a update of some sort, but my tool is broken, and I do not know why is happening. Apparently, I can not install mwoauth on it? And no request on it works. Here is the code that appears: https://www.codebin.cc/code/cm6k7ehql0001lb034uvkxzex:adTBpp3EbH3V6NvgB3sD7hYPPHKGsKMeZy15xyvzTdL [11:04:43] !log tools systemctl restart prometheus@tools on tools-prometheus-7 T385262 [11:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:04:47] T385262: toolforge: alertmanager reports maintain-kubeusers as down, but it isn't - https://phabricator.wikimedia.org/T385262 [11:16:55] !log metricsinfra systemctl restart prometheus@cloud on metricsinfra-prometheus-3 T385262 [11:16:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL [11:16:58] T385262: toolforge: alertmanager reports maintain-kubeusers as down, but it isn't - https://phabricator.wikimedia.org/T385262 [11:30:25] !log metricsinfra systemctl restart prometheus@cloud on metricsinfra-prometheus-2 T385262 [11:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL [11:30:29] T385262: alertmanager reports maintain-kubeusers and tools-redis-7 as down, but they are up - https://phabricator.wikimedia.org/T385262 [11:38:39] !log metricsinfra rebooting VM metricsinfra-prometheus-3 T385262 [11:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL [11:38:42] T385262: alertmanager reports maintain-kubeusers and tools-redis-7 as down, but they are up - https://phabricator.wikimedia.org/T385262 [11:39:06] `Fatal error: Uncaught ContentRetrievalException: Content retrieval failed: Could not resolve host: pt.wikipedia.org in /data/project/alberobot/public_html/WikiAphpi/main.php:163` [11:39:47] DNS is offline, or is just me? [11:45:20] albertoleoncio: I tried from a random tool, and I can resolve pt.wikipedia.org [11:49:07] Yeah... it resolves when I call the script via prompt, but not when I call via web. Weird. [13:23:28] !log lucaswerkmeister@tools-bastion-13 tools.ranker deployed 19c821857d (l10n updates: fi, ko, lb, skr-arab, sr-ec; T384061) [13:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [13:25:58] !log lucaswerkmeister@tools-bastion-13 tools.ranker deployed d65fa5888b (fix language fallback) [13:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [14:25:54] @albertoleoncio if the problem persist, please open a phab ticket [16:08:38] !log copypatrol dev01 hard reboot for T383583 [16:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Copypatrol/SAL [16:08:42] T383583: VM nova records attached to incorrect cloudcephmon IPs - https://phabricator.wikimedia.org/T383583 [16:13:19] !log copypatrol copypatrol-backend-dev-01 hard reboot for T383583 [16:13:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Copypatrol/SAL [16:17:07] !log copypatrol copypatrol-backend-prod-01 hard reboot for T383583 [16:17:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Copypatrol/SAL [16:17:10] T383583: VM nova records attached to incorrect cloudcephmon IPs - https://phabricator.wikimedia.org/T383583 [16:17:26] I wonder if it would make sense to have a separate task to attach all the reboot logspam to ;) [16:17:29] !log copypatrol prod01 hard reboot for T383583 [16:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Copypatrol/SAL [16:17:35] but who knows how many people will be so diligent to log all their reboots too [16:18:15] (to be clear, I think logging the reboots is a good idea, I just see a scaling problem looming ahead ^^) [16:20:39] the etherpad link the email (https://etherpad.wikimedia.org/p/rmrebootsforcephmons ) also appears to be blank [16:21:34] andrewbogott, do you have a preference for logging/coordination for this? [16:31:10] I will run a report next week about what needs reboots still. So the pad is mostly for coordination within projects. [16:35:50] I've pasted Andrew's email in the etherpad now [18:18:07] do we no longer need 2fa for horizon? i haven't been asked for my totp [18:22:19] not currently - it was removed sometime during the IDM implementation / Wikitech SULification [18:41:37] authentication moved to IDP so there won't be 2FA again until the IDP+IDM Developer account management stack implements 2FA [18:42:21] https://wikitech.wikimedia.org/wiki/IDM & IDP == https://wikitech.wikimedia.org/wiki/CAS-SSO