[09:45:49] legoktm, mutante, chaomodus: FYI both gerrit and lists current setups are exceptions in Netbox because are relic from the past that were setup differently and we *should really avoid* to replicate going forward. Instead we should use a more sane and maintainable approach based on the actual needs of those services. You can see the current exceptions in Netbox at [09:45:54] https://netbox.wikimedia.org/search/?q=Keep+manual+DNS . In particular, for ... [09:45:57] ... lists-next, the chosen IP is part of the public1-a-eqiad VLAN, that is bound to row A in eqiad and as long as is managed automatically it will also have a TTL of 1H. [09:47:03] usual question is, why can't it go behind LVS? [09:47:25] Is there any plan to do HA for lists-next? Could we use the "LVS service IPs" prefix https://netbox.wikimedia.org/ipam/prefixes/43/ip-addresses/ in this case? (or maybe create a dedicated one for service IPs outside LVS if we want to more clearly split the two) [10:22:43] <_joe_> XioNoX, volans I guess because it's a temporary test for now? Also completely ran by volunteers on volunteer time [10:41:02] <_joe_> (my comment, to clarify, only applies to the HA/LVS comments above) [10:45:02] yeah I know and I'm not asking for a full solution right now to be clear. But knowing what's the long term plan helps to decide what's best in terms of IP alloction and such. [15:01:44] volans: it feels like we need the various suggested solutions documented, because it is very abiguous right now. [15:02:36] volans: i feel like saying 'don't do it for reasons' kind of doesn't help solve the problems proposed. [15:26:18] so, mailman3 actually can be HA AIUI, while mailman2 couldn't [15:28:12] if lists should be behind LVS then we can look into it and depending on how much work it ends up being it might make sense to wait until after the migration [15:28:29] it's really hard to tell what parts of the current lists setup are done for a specific reason vs those that are just legacy cruft [15:30:20] also the tenative plan is to install mailman3 on the current lists1001 host which serves mailman2, but that's really not set in stone yet: https://phabricator.wikimedia.org/T256539 [15:38:47] legoktm: Yah we're discussing how to document this, I think we'll have to move the service address we added yesterday to another subnet because it needs to not be in the host range [15:42:08] OK, shouldn't be an issue [15:43:36] it would be nice to challenge the existing setup to understand which part is legacy cruft [15:51:02] Yes [15:51:04] that too :) [16:17:15] The API gateway SLO is up https://wikitech.wikimedia.org/wiki/SLO/API_Gateway [16:17:55] * _joe_ fires up the army of AWS / Azure / GCP instances to make it fail [16:18:10] <_joe_> jokes aside, great work :) [16:18:29] no pictures? [16:18:31] * volans sad [16:19:23] hnowlan: congrats :) [16:20:10] honestly I *was* thinking that a public SLO for a public service does kinda invite people to make it fail its targets >_> [16:20:52] hnowlan: joking ofc, that's great. [16:23:40] <3 [16:25:59] hnowlan: quick question for the error rate SLO the choice of 504 only is because other 5xx would mean that other parts of the infra are failing? [16:26:29] s/SLO/SLO,/ [16:33:06] volans: yeah, the idea was that the appservers are liable to serve other 5xx. it's not an airtight assumption though - there's a task to tighten up the definition of error to be (errors served by gw - errors served by mwapi) [16:33:29] ack, thanks! [16:33:37] maybe worth mentioning it there? [16:37:34] yeah good point [17:10:12] hnowlan: putting a service on the internet in the first place is also an invitation for others to make it fail ;) [17:16:15] but this one, this one will get it right [17:16:56] <_joe_> ahah [22:45:27] I filed https://phabricator.wikimedia.org/T278495 "Figure out plan for mailman IP situation"