[11:18:32] arnaudb marostegui I'm looking at https://phabricator.wikimedia.org/T381276 is there a list (or how do I get one?) of the hosts affected? and what was the host(s) affected in the incident referred in the task ? [11:23:42] godog: thanks, there's some discussion in our data-persstence channel. Will look for the host involved that specific day and paste it on the task. [11:23:54] marostegui: ok thx [11:39:24] godog: https://phabricator.wikimedia.org/T381276#10379062 some recap there [11:42:41] cheers [11:48:23] marostegui: it looks like things are working as intended ? [11:48:56] godog: not sure, cause we did have pages for secondary dc hosts [11:48:59] As I posted there [11:51:18] ok checking too [11:51:46] In any case, I think we do need to page for replication broken on the "secondary" dc anyway [11:51:53] _joe_ Amir1 would you agree ^? [11:52:41] yeah I agree [11:54:00] <_joe_> I think it needs some nuance, but yes [11:54:12] <_joe_> the "mwprimary" replicas for sure [11:54:24] <_joe_> not sure about the bidirectional replication hosts though [11:56:07] what do you mean with bidirectional hosts? [11:56:08] x2? [11:59:29] going to lunch, bbiab [13:27:07] marostegui arnaudb Amir1 re: T381276 do you need o11y assistance at this time ? [13:27:08] T381276: replication breakage is not not paging anymore - https://phabricator.wikimedia.org/T381276 [13:32:05] godog: Related to T381276, there is also the patchset https://gerrit.wikimedia.org/r/c/operations/alerts/+/1100042. I’ve just CC’d you. [13:33:13] godog: arnaudb I'm going to review this one [13:33:23] thank you tappof [13:36:38] godog: marostegui I think you're the best person to answer this as you seemed to have a better understanding of the situation than I do! tappof: lmk if I can help, despite the icinga issue I think it would be nice to be certain to have a paging alert even if its a workaround [13:43:18] ack [13:53:27] arnaudb: ok [13:54:05] godog: So yes, we do need to alert on secondary dc databases [13:57:56] ok! [14:30:15] do we have any ongoing issue with gitlab CI runners on DO? [14:35:55] <_joe_> vgutierrez: always :) [14:36:00] <_joe_> I just re-try [14:36:11] yeah... but runs are getting stuck: https://gitlab.wikimedia.org/repos/sre/liberica/-/jobs/407748 [14:36:15] and it's working as expected on ::1 [14:36:19] <_joe_> vgutierrez: always [14:36:25] <_joe_> :) [14:36:39] I haven't noticed this before today TBH :) [14:48:40] <_joe_> that means you don't use gitlab enough :D [14:48:52] <_joe_> jokes aside, it happened to me multiple times over the years [15:27:32] <_joe_> oncallers: I'm about to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1100457, which will make replag in eqiad page again [15:28:02] ack ty [15:30:19] ack [16:44:24] as with yesterday, we're about to start doing some messing with videoscaling jobs in codfw