[07:01:06] greetings [08:31:34] morning, yesterday I uploaded an older version of alloy to the registry (I mistakenly pasted the version of the chart instead of the software itself, :facepalm:). Is there an easy way to delete it as it's not needed by any chance? [09:50:11] hello! [09:51:46] volans: which registry? [09:52:07] docker-registry.toolforge.org [09:53:25] https://wikitech.wikimedia.org/wiki/Docker-registry#Deleting_images [09:53:31] never tried it myself though [09:53:51] isn't that for the prod registry? [09:53:58] ah sorry you're right [09:54:36] https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Docker-registry#Delete_old_images [09:55:44] thx [09:56:58] possibly not worth for a ~100MB image [10:09:44] yeah agreed [10:14:42] we didn't get paged for yesterday cloudweb outage did we? [10:15:32] * dhinus checks [10:16:00] we did not [10:16:07] sigh [10:16:09] :( [10:16:14] it is task time [10:16:28] :D [10:16:48] in my mailbox I find a "page" cloudweb alert from october, but it was a nagios one [10:17:00] this time I only received an alertmanager one, with severity=warning [10:17:47] it's correct because it's about the single node being down [10:18:07] I think we miss an alert that checks the availability of the services running on cloudweb* [10:19:18] indeed, that's now T411470 [10:19:18] T411470: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470 [10:19:25] also review the pybal config [10:20:49] indeed, my understanding is that pybal did the right thing in this case in the sense that both realservers were failing healthchecks [12:13:55] dhinus: https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/92 [12:15:33] taavi: thanks! +1d [13:28:19] from a shell in a tool, where I can find the labstore-secondary-tools-home directory? is it mounted somewhere? [13:28:23] to test the nfs tracing [13:28:50] does /proc/mounts list it? [13:29:36] no, but it might be hidden elsewhere :D [13:30:16] I only see the 2 dumps, the scratch and the tools-project one [13:30:26] lol indeed [13:31:50] would be at /home if it was mounted [13:32:15] ack, do you know anything in toolsbeta that mounts it? [13:32:28] that's where I looked first and was empty indeed [13:35:57] if it's not mounted in tools then I don't think it is mounted in toolsbeta either [13:36:44] I checked on one tool only, or is it mounted either for all or for none? [13:47:02] the mount config would be the same on all tools [13:50:47] ok thx [16:00:41] tools-db is alerting, looking [16:00:58] perfect timing given I just closed two toolsdb tasks as "resolved" :D [16:02:23] andrewbogott: did you reboot tools-db-7 by any chance? [16:02:51] I thought mariadb would auto-restart after a reboot but it didn't [16:03:21] restarted now [16:03:25] I did, can you revive it? [16:03:51] yes already revived [16:04:19] in the past, I'm pretty sure mariadb was starting automatically after a reboot [16:04:39] that still meant you had to enable read-write on the primary, but the replica would resume without manual intervention [16:04:51] this time even if it was the replica being restarted, it needed a manual "systemctl start mariadb" [16:04:52] iirc it's explicitely configured to not automatically start? [16:05:15] in prod mariadb is not restarted on reboot [16:05:21] not sure if you use a specific setting to do that [16:05:34] taavi: in production dbs, yes. in toolsdb I'm pretty sure it's configured to start as read-only, but to start anyway [16:05:49] I will check if i can track down when/if this config changed [16:21:32] unrelated: is there a way to email all the members of a cloudvps project? (context: https://phabricator.wikimedia.org/T409668#11424478) [16:21:40] no [16:21:59] (well, there is a script in puppet to do that once, but that's not very helpful for that situation) [16:22:05] ack thanks! [16:30:27] dduvall: there are a few gitlab-runner things on https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/2RV6C3TKSILUX6BGZZY4MFLLIJ6IEVDE/ -- is it ok if I reboot them now (one at a time) or do you need to depool someplace first? [18:20:41] * dhinus off