[05:38:56] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Cwek) Can stop your hand? login.wikimedia.org is a CNAME of www.wikipedia.org and the wikipedia.org domain is p... [05:41:02] 10Traffic, 10Operations, 10Core Platform Team Backlog (Next), 10Services (next): Have Varnish set the `X-Request-Id` header for incoming external requests - https://phabricator.wikimedia.org/T221976 (10ema) [05:41:06] 10Traffic, 10Operations, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, 10Services (watching): Package libvmod-uuid for Debian - https://phabricator.wikimedia.org/T221977 (10ema) 05Open→03Resolved >>! In T221977#5168771, @mobrovac wrote: > @ema since the pkg has been uploaded... [07:23:05] <_joe_> ema, vgutierrez I have uploaded the change of url for the check on the mw clusters from pybal [07:23:16] <_joe_> but I'd restart pybal later, I have to run an errand [07:23:31] <_joe_> if you need to restart the low-traffic ones in the next hour or so [07:23:42] <_joe_> please be careful and start from codfw :) [07:23:57] <_joe_> if this is not ok, I can wait and do the restarts now [07:40:59] _joe_: no restarts planned this morning AFAIK [07:42:30] _joe_: so yeah feel free to go run your errand :) [10:09:29] <_joe_> I'm going to restart the pybals now [10:09:41] ack thanks for the heads up [10:11:23] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10lilydjwg) Hi there, thanks for your work but it demonstrates unexpected issues in mainland China. The *.wikipe... [12:45:48] dear traffic team [12:45:57] i would really like your help on this CR https://gerrit.wikimedia.org/r/c/operations/puppet/+/509049 [12:46:07] just a quick look [13:15:46] fsero: two comments! [13:17:41] thanks ema ! going to fix them right away [13:18:51] fsero: please also run pcc against a couple of text nodes, cp1079.eqiad.wmnet and cp2013.codfw.wmnet for example [13:19:17] sure ema it was in my mind but i knew there was silly things to fix first [13:19:21] count on it [13:30:47] ema: updated! [13:35:53] fsero: there's another problem I think due to the fact that we're switching from eqiad to codfw in one commit [13:35:58] fsero: see https://wikitech.wikimedia.org/wiki/Global_traffic_routing#Cache-to-application_routing [13:36:06] ok [13:37:20] as you can see from the pcc diff on https://puppet-compiler.wmflabs.org/compiler1001/16439/cp1079.eqiad.wmnet/ and https://puppet-compiler.wmflabs.org/compiler1001/16439/cp2013.codfw.wmnet/, we're basically saying: [13:37:53] // on eqiad caches [13:37:55] if (req.http.Host == "docker-registry.wikimedia.org") { go to the codfw caches } [13:38:05] // on codfw caches [13:38:09] if (req.http.Host == "docker-registry.wikimedia.org") { go to the eqiad caches } [13:38:19] ema: this is a read only registry so is ok to also enable the eqiad side for a while [13:38:28] so if i enable the eqiad backend we should be good to go [13:38:30] right? [13:38:39] no loops would be present [13:38:56] and then in a subsequent patch disable eqiad [13:39:07] correct, if we switch from eqiad only to eqiad/codfw in a intermediate step there will be no risk of loops [13:39:53] then lemme update the CR [13:39:57] thanks [13:40:01] thanks to you sir [13:41:59] out of curiosity, why can't we be using eqiad and codfw active/active? [13:45:12] docker-registry replication between DCs is done via swift replication [13:45:19] and swift replication is kinda slow for now [13:45:48] this is for our public registry which may lead to not presenting an image in one request and present it the next [13:45:55] because the image is on codfw and not in eqiad [13:46:17] the goal is to become active active improving swift replication but this is good enough ™️ [13:46:19] for now [13:46:31] understood [13:47:19] is it still true even now that the initial replication has finished that container sync is still slow ? [13:48:07] godog: its not so bad because our delta is quite small, but it has a high latency, replication job is fired every 300s and the duration of the replication job is 60s [13:48:17] so if the delta is small and fits that window is perfect [13:48:23] usually is not the case for docker layer images [13:48:28] ouch, yeah that's slow alright [13:48:31] so it needs several passes [13:48:45] we would benefit a log of enabling replication log as well [13:48:56] *a lot [13:49:25] ema: is updated with a new PCC i have no idea how to apply this btw [13:49:51] that also reminds me I don't think we have stats from container sync in swift itself :| [13:50:07] create a phab task before slips your mind! [13:51:14] yeah seriously, doing that now [13:55:09] fsero: the procedure is: either wait for puppet to run on the eqiad/codfw text caches or run it manually [13:55:31] not in a rush at all so i can wait for next puppet run [13:55:37] k [13:56:43] fsero: what's gonna be darmstadtium future now? [13:56:58] dev/null [13:57:09] after some prudential time [13:57:14] like a couple days [13:58:41] fsero: mmh, I'm getting connection refused to port 81 on both eqiad and codfw https://phabricator.wikimedia.org/P8500 [13:59:17] mmm good catch [14:02:44] yeah lvs was configured to use 443, i guess i can use the registry1XX1 and registry2XX1 instead [14:04:02] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) @Cwek @lilydjwg - Thanks for the reports! I apologize, this time around the fallout should've been pre... [14:08:15] ema: updated paste, modifying CR to point directly to the vm instead of the lvs endpoint [14:20:03] fsero: +1 [14:36:06] ema: thanks a lot for the help [14:37:16] fsero: de nada! [15:05:15] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10fgiunchedi) a:05mforns→03colewhite Reporting here a chat between me and @Ottomata re: metric naming a... [15:07:39] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10jbond) Is this ticket complete? can it be closed, if not what further actions are required? [15:24:49] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10Reedy) >>! In T213769#5170303, @jbond wrote: > Is this ticket complete? can it be closed, if not what further actions are required? I guess the removal needs merging? >>! In T213769#4879... [15:25:57] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10jbond) > I guess the removal needs merging? Oh yes missed that :D [15:34:04] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) I think there are a few more branches: - producer, consumer - global, per broker, per topic, p... [15:37:26] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10Jdforrester-WMF) I was told by Traffic that the removal is blocked on something in SRE land, but I don't know enough to even give pointers, sorry. [15:47:13] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10BBlack) Yeah, it's mostly just blocked on us making some time to deal with it, and time has been in extremely short supply lately, so we tend not to prioritize anything that doesn't have imm... [15:48:38] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10Jdforrester-WMF) Yeah, this is definitely not urgent, just tech debt clean-up and RelEng (though it will simplify life for Analytics when they don't have to worry about this stuff any more,... [15:53:50] 10Traffic, 10Operations, 10Zero, 10Patch-For-Review: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10jbond) Ok great thanks for the update [16:08:09] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) BTW for reference here's what EvenGate is currently exporting: https://gist.github.com/ottomat... [17:13:42] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) Our analytics seems to indicate the changes above had the intended effect in restoring normal levels of... [18:44:13] 10netops, 10Operations, 10Patch-For-Review: set up a looking glass for WMF ASes - https://phabricator.wikimedia.org/T106056 (10ayounsi) 05Open→03Declined The amount of work required to properly deploy a (muti-dc) looking glass is, so far, not worth the benefits of having and maintaining one. - Peering wi... [18:52:53] 10netops, 10Operations, 10Patch-For-Review: set up a looking glass for WMF ASes - https://phabricator.wikimedia.org/T106056 (10Nemo_bis) Thanks for considering this and for sharing the analysis. [20:38:31] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Free up 185.15.59.0/24 - https://phabricator.wikimedia.org/T211254 (10ayounsi) The conversation went a bit outside the scope of the task description. Re-focusing on it and with the new info of T222392, I renumbered mr1-esams links (trivial change) so 1... [22:02:13] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10colewhite) I agree with dropping the prefix in favor of "rdkafka". I see the type branch (producer and c...