[07:11:56] greetings [08:35:22] morning [10:15:01] there's a toolforge request i'd like a second opinion on: https://toolsadmin.wikimedia.org/tools/membership/status/2217 [10:15:28] basically, to me, that seems like someone wanting to pull our data for commercial purposes, not someone trying to make a tool to help improve the wikis (what we are for) [10:18:29] why was their answer in french...? xD [10:22:07] I agree. without knowledge of the french-company-specific stuff they're describing there, this seems to me like they are trying to benefit from the data for their business, with a "compromise" of giving back some data they have that we might be missing, which doesn't seem like a fair trade [10:22:16] it seems that the only reason they are asking for toolforge is to circumvent the rate limiting [10:22:33] the rest is sugar coating [10:22:51] exactly. it makes more sense to point them to wikimedia enterprise? [10:23:04] so I would actually redirect them to either noc@ or the other email address that then lends towards enterprise [10:23:38] bot-traffic@? [10:24:14] yes but I was unsure how much they wanted to have that public ;) [10:24:50] it's listed at https://wikitech.wikimedia.org/wiki/Bot_traffic so I don't think it's supposed to be particularly secret :P [10:25:18] fair enough :D [10:36:26] also, looking for a +1 (or objection) to T425892 [10:36:28] T425892: Request creation of wiki-polis-backend VPS project - https://phabricator.wikimedia.org/T425892 [10:48:38] that software seems to support only postgres and dynamodb [10:49:06] technically speaking, if they had a trove postgres could they have the tool on toolforge connect there? [10:51:10] yes [10:51:44] i just rebooted toolforge k8s gateway workers and I think that caused a brief outage, the cookbook is too fast to move on to the next node after rebooting the first [10:51:48] could that be a middle ground solution? keep everything on toolforge except the db? [10:51:51] (all resolved now) [10:51:59] ack [10:52:21] could be [10:54:05] (filed T426948 to make the cookbook smarter) [10:54:06] T426948: wmcs.toolforge.k8s.reboot needs to be slower with Gateway nodes - https://phabricator.wikimedia.org/T426948 [11:06:07] not particularly proud of https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/85 [11:06:20] though as a bandaid for T426827 I think it is acceptable [11:06:21] T426827: webservice-cli package deb gitlab CI job went from 9 minutes to 27 minutes - https://phabricator.wikimedia.org/T426827 [11:06:45] 30s is the deb building job time now [11:07:12] so a 20x/60x factor [11:07:34] :facepalm: [11:08:23] lunch [13:25:23] godog, I was about to reboot cloudcontrols but that'll go better post-zookeeper, is that change coming soon? [13:30:37] dhinus: do you have bandwidth to reboot clouddb hosts for T426563? [13:30:51] or I guess you could re-teach me how two do it [13:32:16] andrewbogott: I'm on it, I rebooted one this morning, and currently having a shot at fixing T420203 [13:32:17] T420203: Extend sre.mysql.upgrade to work with multiinstance hosts - https://phabricator.wikimedia.org/T420203 [13:32:48] great! ty [13:43:52] topranks: ooc, what's the status on moving bgp/bfd alerts from icinga to alertmanager? working on reboot automation for our bgp-enabled hosts and I can easily silence the new prometheus alerts, but doing the same for icinga gets a lot tricker [13:47:33] andrewbogott: yes I'll push it tomorrow or monday at the latest [13:49:24] Great, let's wait to reboot until after [13:49:54] SGTM [14:31:23] taavi: yeah we have most of the plumbing for that done [14:31:31] Is recording cookbook tests general for all cookbooks or is that just a wmcs-cookbooks thing? I am following the readme (I think?) but no recording is generated. [14:31:36] COOKBOOK_RECORDING_FILE=/home/andrew/rebootmons.yaml COOKBOOK_RECORDING_ENABLED=true test-cookbook -c 1288982 wmcs.ceph.roll_reboot_mons --cluster-name codfw1 [14:31:41] we have it set up so teams can configure it for their own hosts: [14:31:42] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-sre/bgp.yaml [14:32:05] and then for the core network: [14:32:06] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-netops/bgp.yaml [14:32:07] andrewbogott: custom wmcs stuff [14:32:21] ok, so potentially no one but david has used it [14:33:40] re: icinga I think we removed the BGP checks there [14:33:44] but you're right bfd is still on it [14:33:52] we have that for alertmanager too: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-netops/bfd.yaml [14:34:06] so we should probably remove the Icinga check, I'll look into it [16:07:47] andrewbogott: there should not be any reasons today to avoid dashes? [16:09:03] correct, that's why I used the technical term 'bad luck' -- it's not immediately obvious when looking at a project with dashes if it's old-and-cursed or new-and-uncursed, seems easier to just not use them. [16:09:09] but... it's not a strong argument [16:09:25] i would prefer to not be afraid of future hypothetical problems [16:09:31] lol ok [16:09:41] dashes were fine for ages, until we found exactly one place where they were not, and fixed it [16:17:51] the object gateway is the thing that is wrong in the universe