[07:45:13] _joe_: I'll take a look to that pybal CR later today [07:45:33] <_joe_> vgutierrez: thanks, I'd have asked you [09:07:45] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin2001.codfw.wmnet for hosts: ` ['cp4023.ulsfo.wmnet'] ` The log can be... [09:50:11] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4023.ulsfo.wmnet'] ` and were **ALL** successful. [10:33:20] 10netops, 10Operations, 10Operations-Software-Development, 10netbox, 10User-crusnov: Netbox report to validate network equipment data - https://phabricator.wikimedia.org/T221507 (10Volans) [10:33:46] man I would love to be working on pybal instead right now ;p [10:33:58] 10netops, 10Operations, 10netbox: Netbox switches consistency report - https://phabricator.wikimedia.org/T212878 (10Volans) [10:34:27] 10netops, 10Operations, 10netbox: Netbox should use CN rather than UID for LDAP login username - https://phabricator.wikimedia.org/T210566 (10Volans) [11:57:11] ema: something's not quite right in upload@ulsfo -land [11:57:13] https://grafana.wikimedia.org/d/000000500/varnish-caching?orgId=1&from=now-12h&to=now&var-cluster=cache_upload&var-site=ulsfo&var-status=1&var-status=2&var-status=3&var-status=4&var-status=5&refresh=15m [11:57:45] the shape of the hitrate effects seemed reasonable for the node being depooled -> reimaged -> repooled earlier [11:58:04] but th eulsfo availability alert and dropoff in cache hitrate ~30 mins ago... what is that? [11:58:16] perhaps the 404 CL issue again [11:58:59] the dropoff is rather steep, and the 404 issue is patched twice-over isn't it? [11:59:33] well there's the extensions VCL workaround, but we know that other URLs were causing the issue too [11:59:39] for the actual fix, I haven't restarted swift yet [11:59:46] ah [12:00:48] let's apply that first [12:13:27] bblack: I've restarted swift-proxy in codfw, it seems that was it judging from the availability graphs [12:19:18] awesome [12:27:47] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: SwiftMedia URL rewrite returns some 404s with wrong Content-Length - https://phabricator.wikimedia.org/T222071 (10ema) 05Open→03Resolved This is now fixed, CL matches the actual body length: ` $ curl -v http://swift-rw.disco... [13:18:11] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) The current iteration of the proposed broadly-applied production version is in PS3 of the patch @ https... [14:19:42] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) Hm, sorry for this probably too late idea...but would it be worth building a C based prometheus... [14:31:00] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10elukey) @Ottomata ahhhh you mean in varnishkafka itself! I thought that it would have needed a change in... [14:37:54] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) I think varnishkafka is already using this callback to write the stats out to the json file. I... [14:40:41] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10elukey) But something would need to be created (a simple exporter) to read the json with the Prometheus m... [14:46:34] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) Aye yeah I guess there'd have to be some pull service, ya. Maybe converting whatever varnishka... [14:51:37] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10ema) >>! In T196066#5153029, @Ottomata wrote: > Hm, sorry for this probably too late idea...but would it... [14:53:01] actually I may have to skip our weekly, but I'll keep an eye on etherpad [15:02:49] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) Ya good point [15:03:56] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) Hm but also, whatever we replace varnishkafka with will likely be librdkafka based. Perhaps a l... [22:09:08] 10netops, 10Operations, 10ops-eqiad: Replace eqiad mgmt switches with EX4200s - https://phabricator.wikimedia.org/T213128 (10ayounsi) 05Open→03Declined Going with option 2. Will open tasks in the next FY when it's time to order them. [22:41:32] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: codfw row C switch upgrade - https://phabricator.wikimedia.org/T170380 (10ayounsi) [22:41:34] 10Traffic, 10netops, 10Operations: Investigate lvs IP pages during codfw row C switch upgrade - https://phabricator.wikimedia.org/T171032 (10ayounsi) 05Open→03Declined This is almost 2 years old now, I don't think we have any other logs to investigate it or if it happened again. Please reopen if you thin... [23:02:57] 10netops, 10Cloud-Services, 10Operations, 10cloud-services-team (Kanban): Allocate public v4 IPs for Neutron setup in eqiad - https://phabricator.wikimedia.org/T193496 (10Krenair) For the record, with the migration away from and shutdown of the nova-network 'main' region, the 208.80.155.128/25 range is no...