[07:33:17] <_joe_> kormat, marostegui I was thinking, regarding the proxies [07:33:22] <_joe_> the db proxies [07:33:28] <_joe_> we're now using cnames [07:33:45] <_joe_> wouldn't it make more sense to have an IP we can "move"? [07:33:56] <_joe_> be that an LVS IP or whatever else [07:34:14] _joe_: As far as I was told, floating IPs wasn't something encouraged [07:34:55] no, the problems wasn't that netops said they couldn't provides us with a floating ip between racks due to our network architecture [07:35:08] *was [07:35:58] <_joe_> I was proposing LVS instead of a floating IP [07:36:30] <_joe_> floating IPs is a very bad idea IMHO, I have no idea why it's so prominent everywhere [07:36:49] the problem is lvs doesn't fix the spof issue [07:37:03] it just moves it up [07:37:06] _joe_: but having LVS won't actually solve anything [07:37:12] yeah, what jaime just said [07:37:13] <_joe_> jynus: what? [07:37:20] <_joe_> I strongly disagree [07:37:35] <_joe_> it makes your operations faster, and the reaction of systems much faster in case of failure [07:37:37] you have to balance the lvs server [07:38:02] <_joe_> jynus: we already do that [07:38:24] <_joe_> how do you think we survive whenever one LVS server goes down or we restart pybal? [07:39:04] _joe_: I believe that now if a primary fails, the failover is done in one minute - we can of course make it failover faster, but we give some time in case it was a temporary flap [07:40:00] <_joe_> marostegui: it's done in a minute without changing dns? [07:40:07] _joe_: yep [07:40:15] <_joe_> if the proxy fails? [07:40:23] <_joe_> I meant balancing the proxies [07:40:32] <_joe_> not the databases [07:40:32] Ah, no, that needs manual changing of course [07:40:40] <_joe_> yeah that's what I was suggesting [07:41:01] <_joe_> if we had the dbproxies behind a load-balancer, that would guarantee HA [07:41:06] _joe_: But anyways, misc really needs some love in general. We've been talking about it for a few months, as we need to check how to consolidate things within misc to use the resources better and all that [07:41:21] but we will need 2 lvss [07:41:24] 2 proxies [07:41:27] and 2 dbs [07:41:41] <_joe_> jynus: the lvs servers already exist [07:41:48] I would like to cut the 2 lvss and 2 proxies into just 2 [07:44:10] the other issue that virtual ip (or any other alternative) that lvs doesn't do is fencing [07:45:08] <_joe_> fencing of dbproxy? [07:45:14] <_joe_> 🤷 nevermind [07:45:46] yes, so they don't end up one pointing to one host and the other to other [07:46:18] note the dbproxies are,for misc hosts, only failover mechanisms [07:46:26] not load balancers [07:47:04] and many db connections end up "stickied" on the wrong host due to persistent connections [07:48:18] in other words, proxying of master needs different solutions than proxying of replicas, which can work with almost anything [07:53:16] <_joe_> thanks for the explanation, I wasn't aware [07:54:06] there is still some options there [07:54:39] for example, fencing may not be needed if there was (and I am speaking loosly here) an etcd-driven configuration management for both proxies [07:54:58] so they could be seen as stateles and guarantee to have the latest config [07:55:12] there is room for options [08:07:04] I tried to remove old docker images from debmonitor like described at https://wikitech.wikimedia.org/wiki/Debmonitor#Manually_remove_an_image_from_DebMonitor but that fails with a client certificate validation error. Is that just outdated documentation or maybe a problem? [08:08:25] <_joe_> jayme: can you paste the error somewhere? [08:09:51] _joe_: Its just 403 "Client certificate validation failed: ''" [08:10:19] can you paste the full command used? [08:10:57] "and any other incriminating evidence" ;) [08:11:00] sudo curl -X DELETE 'https://debmonitor.wikimedia.org/images/docker-registry.wikimedia.org/envoy-tls-local-proxy:1.12.2-1' --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key [08:12:16] maybe I need to leave out the "registry"? ... haven't tried "deleting aroung", though [08:12:22] *around [08:15:08] you need to run it against debmonitor.discovery.wmnet, it just worked fine for me [08:15:12] Ah, silly me. The host needs to be incriminating [08:15:16] ups :) [08:15:19] debmonitor.discovery.wmnet...ofc [08:15:22] I'll fix the broken wikitech docs [08:15:36] ah, no they are actually correct :-) [08:15:47] Na, they are fine moritzm... I copied the URL from the browser [08:15:59] Sorry for the noise... [08:17:32] ok :-) [08:25:05] <_joe_> jayme: maybe add to the docs a specific warning [08:25:16] <_joe_> I'm pretty sure I made the same mistake in the past :P [08:25:19] "WARNING: do not be jayme" [08:26:01] Yeah. I'll add "On 403, send a page to kormat via VictorOps" :D [08:26:05] haha [08:27:01] Or maybe I can patch debmonitor to do so directly...less toil :) [08:44:30] <_joe_> jayme: I approve this attitude [09:18:22] jbond42: i'll close my CR in favour of yours [09:18:38] kormat: oh sorry i didn;t see yours [09:19:14] kormat: there exactly the same just go with yours [09:19:32] yours has some more background, and has pcc tests. i can't compete with that. ;) [09:19:41] lol ok :) [09:20:34] merged [09:20:48] great, thanks :) [11:52:40] Does anyone on SRE have create project permissions on Gerrit or should I be going to releng for that? [11:55:44] a few have, yes (members of cn=gerritadmin LDAP group) [11:59:47] I don't have gerritadmin but IME turn around time for project creation is < 1d [12:14:33] hmm could we add the Hosts: field to the valid list of fields on the commit message footer? [12:15:39] vgutierrez: https://phabricator.wikimedia.org/T166066#5087807 [12:16:04] oh, ok :) [12:16:07] thanks kormat was just trying to find the link :) [12:16:15] np :) [12:16:19] jbond42: ❤️ [12:16:30] (for the pcc-full-diff feature you added) [12:16:36] dont know who to ping to make a relase though [12:16:53] a release of the container? [12:17:02] maybe hashar could help with that [12:17:34] vgutierrez: not sure if just the container or if another release of commit-message-validator is also needed [12:17:51] kormat: feel free to update the task if the format needs tweaking