[07:58:57] XioNoX: because cathal is on PTO, I propose we cancel today's network sync meeting, unless you say otherwise [08:12:55] arturo: that works for me. I still have https://phabricator.wikimedia.org/T375259 on my todo [08:13:23] XioNoX: great, thanks for keeping that in the radar [11:26:33] * dcaro lunch [11:37:34] dhinus: I will send an email to cloud-admin@ regarding the status of tofu-infra [11:39:21] ok! if you create an etherpad I can do a quick review [11:39:28] ok [11:41:07] dhinus: https://etherpad.wikimedia.org/p/tofu [11:43:16] looks good! I made a couple of small edits [11:43:35] great, thanks [12:46:11] dhinus: moritzm quick review https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075552 (to unblock the reimage of the new cloudcephosd nodes, not 100% sure that it will fix it but worth trying) [12:51:14] XioNoX: I53c68434b8d2bc935a34fabc7b77ab8c859cc535 broke puppet runs on all the prometheus VMs [12:51:29] looking, though it does not seem as easy to hack around as the other one [12:52:19] I think that at this point might be better to allow disabling grpc and disable it in cloud [12:52:59] ttp://pki.discovery.wmnet/bundles/network_devices.pem is not reachable from cloud [12:53:03] *http://pki.discovery.wmnet/bundles/network_devices.pem [12:53:38] ah... of course [12:54:00] open to other ideas though :) [12:55:15] dcaro: yeah not sure what else can be done, do you have handy a if/else cloud somewhere I can copy ? [12:55:44] XioNoX: I think we can do it with a parameter in the class, and set it in the cloud.yaml [12:56:55] I can do the patch if you are busy [12:59:16] dcaro: if you don't mind, sure you will probably be faster than me [12:59:25] 👍 on it [13:16:19] dcaro: just +1d [13:16:25] thanks! [13:17:54] <_joe_> people at one edithaton are reporting outreachdashboard.wmflabs.org as being down, so volunteers can't sign up for the workshop [13:18:48] <_joe_> I know it's not managed by us, and I don't have even read access to it, but if someone has some spare cycles maybe we can help? not sure [13:19:18] <_joe_> looks like all it takes is restarting the webserver https://phabricator.wikimedia.org/T374565#10138736 [13:26:04] a classic "have you tried turning it off and on again?" :) [13:26:38] XioNoX: patch ready https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075559 [13:27:09] dhinus: are you on outreachdashboard.wmflabs.org ? [13:27:15] yes [13:27:20] thanks :) [13:28:12] IIUC there's no immediate action as it's working again, but I'm trying to find out where it's running anyway [13:37:29] found it, it was not easy as the name is different and there was not direct link from anywhere :) https://toolsadmin.wikimedia.org/tools/id/wikiedudashboard [13:43:53] still not finding where the custom DNS is defined [13:44:11] proxy no? [13:47:20] yep, but I forgot about the wmflabsdotorg project, it's in there [13:48:06] where's the mapping proxy->tool? [13:48:28] proxy-*.project-proxy reads it from redis [13:50:08] https://www.irccloud.com/pastebin/DPbboI6c/ [13:50:39] not sure if it's anywhere else too, I doubt it [13:51:10] but how do you add stuff there? [13:51:34] aaahhh, there's a service that horizon uses for it [13:51:36] custom one [13:52:49] https://usercontent.irccloud-cdn.com/file/dfrnQc3P/image.png [13:52:52] I think it's that one [13:53:02] (from horizion -> system information) [13:54:22] for users is the 'compute->dns->web proxies' custom tab (developed by us) [13:57:10] I'm not finding any Horizon project that contains the entry for "outreachdashboard", but I only looked in a few ones [14:00:35] hmm, we can get who's IP 172.16.6.223 is [14:01:22] https://www.irccloud.com/pastebin/eACUbeeI/ [14:03:23] woot [14:05:06] so outreachdashboard.wmflabs.org is a web proxy registered in the globaleducation project to point to the peony-web VM [14:05:08] so the toolforge tool I found is not where the app is running, it's actually in that peony-web vm [14:05:33] maybe that's just a proxy for the other tool? [14:05:46] nah I see Ruby processes in the vm [14:06:03] usually tools only have *.toolforge.org domains [14:06:16] (we don't allow anything else at the proxy level) [14:06:28] yep that's why I was confused, but the name looked very similar, and I assumed there was some proxy [14:07:48] there's https://meta.wikimedia.org/wiki/Programs_%26_Events_Dashboard but I don't see any mention about the deployment, where it's running, etc. [14:13:11] it's a very old project it seems, there's tasks from 2015 [14:14:04] dhinus: wdyt? https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/58 [14:14:10] dhinus: yep. I'm tempted to add a task in their backlog asking for more docs, but I'm not sure it would be effective [14:14:19] (I merged a similar one for secgroups) [14:14:40] https://phabricator.wikimedia.org/tag/education-program-dashboard/ [14:18:11] I always forget that openstack-browser gives you the answer to (almost) all openstack questions: https://openstack-browser.toolforge.org/proxy/ [14:18:26] you can quickly see all the proxies at that page, and the VM they point to [14:23:24] now, what is wikiedudashboard in toolforge? it contains php code, so definitely not the same thing :) [14:24:21] found it, it's this thing, supporting the other one https://github.com/WikiEducationFoundation/WikiEduDashboardTools [14:27:39] I was hopeful when I saw them saying they had some perf issues... but then noticed it was from March 2021 - https://phabricator.wikimedia.org/T273067#6925678 [14:27:46] I added this info to https://phabricator.wikimedia.org/project/manage/1052/, I'm opening a task as well because this should really be better documented [14:49:51] dcaro: thanks for fixing Prometheus! [15:37:34] XioNoX: np :) [15:47:50] re: outreachdashboard, I created T375642 [15:47:51] T375642: Write documentation about how the app is deployed in Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T375642 [16:46:27] dcaro: was this the file you were thinking about for alertmanager team routing? https://gerrit.wikimedia.org/g/operations/puppet/+/production/hieradata/common/profile/prometheus/icinga_exporter.yaml [16:46:39] it should affect icinga-generate alerts only [16:46:48] oh yes [16:46:56] so I'm still confused as to why the cloudidm alert is being tagged with "team:wmcs" [16:47:27] maybe it's being checked from the 'cloud' prometheus instance? that would add the team=wmcs tag to anything it does [16:50:30] possibly [16:56:35] nope, thanos shows prometheus="ops" [16:57:01] the mystery remains... I'll try to find out tomorrow :) [16:58:35] oh, found out the place where we set the format of the tasks xd [16:58:36] https://gerrit.wikimedia.org/g/operations/puppet/+/0aa8bcec9a074829182dbbd30a2c53b740784e35/modules/alertmanager/templates/alertmanager.yml.erb#616 [17:07:08] ohhh, this is interesting [17:07:13] https://www.irccloud.com/pastebin/O47o8LW2/ [17:08:17] yep, that's the source of it https://gerrit.wikimedia.org/g/operations/puppet/+/0aa8bcec9a074829182dbbd30a2c53b740784e35/modules/profile/manifests/contacts.pp#39 [17:09:25] ohhh so there's even one more source of "who owns a server" :D [17:12:08] https://gerrit.wikimedia.org/g/operations/puppet/+/production/hieradata/role/common/idmcloud.yaml#1 [17:14:42] it's a way of emphasizing ownership xd [17:15:02] I checked a cloudcontrol role, and it has like 5 different ones :P [17:15:06] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/hieradata/role/eqiad/wmcs/openstack/eqiad1/control.yaml [17:15:49] profile:admin:groups, contactgroups, cluster, role_contacts [17:15:58] hahahaah [17:16:01] I will open a task to discuss if we could not have just one that includes all the others [17:16:09] I think contactgroups will go away with icinga [17:16:10] I'm sure there were reasons for duplicating some of those [17:16:23] but I'm pretty sure we could unify them at least for WMCS, other teams might have different needs [17:16:36] agree +1 [17:16:41] there's also base:cloud_production that's defined somewhere else [17:24:36] neat [17:24:37] * dcaro off [17:24:41] cya tomorrow [17:46:47] I created T375673 [17:46:48] T375673: Define single hiera key to identify WMCS-managed bare metal hosts - https://phabricator.wikimedia.org/T375673 [17:49:50] * dhinus offline