[00:51:22] 10netops, 06Operations: netmon1002 networking setup - https://phabricator.wikimedia.org/T159757#3226739 (10Dzahn) This is unblocked now since netmon1002 has been installed, and IPv6 has been configured (T159756#3226683) netmon1002.wikimedia.org has address 208.80.154.5 netmon1002.wikimedia.org has IPv6 addres... [00:57:12] 10netops, 06DC-Ops, 06Operations, 10fundraising-tech-ops, 10ops-eqiad: decom barium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T162952#3226748 (10Dzahn) [00:59:35] 10netops, 06DC-Ops, 06Operations, 10fundraising-tech-ops, 10ops-eqiad: decom barium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T162952#3226767 (10Dzahn) adding #netops for the "switch port" check boxes. needs access to srx550s. [07:27:39] ottomata,bblack - when Varnishkafka fails in that way it would be really good to collect the "VSL" shm tag that collects timeouts messages and other things related to the shm log (the two cases that we hit in the past were not enough time between Begin/End tags and shm circular log overritten between Begin/End tag - overflow) [07:31:42] maybe having counter(s) exposed in /var/cache/varnishkafka/etc.. ? [08:22:03] 10netops, 06Operations: netmon1002 networking setup - https://phabricator.wikimedia.org/T159757#3227188 (10ayounsi) 05Open>03Resolved a:03ayounsi I don't see the IP or hostname of netmon1001 hardcoded in network devices. DNS records for librenms.wikimedia.org will have to point to the new IP as it's the... [08:45:04] 10netops, 06Operations: netmon1002 networking setup - https://phabricator.wikimedia.org/T159757#3227240 (10akosiaris) [09:02:05] 10Traffic, 06Operations, 10Page-Previews, 06Performance-Team, and 3 others: Performance review #2 of Hovercards (Popups extension) - https://phabricator.wikimedia.org/T70861#3227302 (10Gilles) We don't make that distinction. The time it takes for something to come up // is// performance. It's a blocker for... [10:58:19] 10Traffic, 10Analytics, 06Operations: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3227497 (10elukey) [10:58:30] created --^ [11:38:53] 10Traffic, 10Analytics, 06Operations: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3227497 (10JAllemandou) +1 for that! Thanks @elukey for raising this. [11:50:44] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#3227560 (10ayounsi) On a side note, not relying on IPv6 RA, and using static routes/IPs (see T102099) on at least the nodes that use IGMP snooping would work a... [11:54:17] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#3227569 (10faidon) >>! In T133387#3227560, @ayounsi wrote: > On a side note, not relying on IPv6 RA, and using static routes/IPs (see T102099) on at least the... [12:55:21] so I see that cp2002 and cp2024 seem to have had a particularly turbulent long weekend [12:56:05] no more issues after the restarts though, and reverting the routing probably is also gonna help [13:07:54] I'm gonna start the upgrades to varnish 4.1.6 with maps [13:22:10] 10Traffic, 10Analytics, 06Operations: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3227754 (10Ottomata) Why not both!? :) [13:53:26] bblack: given the switchback procedure for traffic, is the approach taken in the switch to codfw still valid? [13:53:29] https://gerrit.wikimedia.org/r/#/c/351313/1/hieradata/role/common/cache/text.yaml [13:54:10] (merge 1 commit, during the warmup, apply it reliably in eqiad first, then in codfw) [13:55:40] volans: yes [13:55:50] great, thanks [14:33:01] 10Wikimedia-Apache-configuration, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: wikitech-static sync broken - https://phabricator.wikimedia.org/T101803#3227922 (10Andrew) 05Open>03Resolved I upgraded wikitech to 1.28.2 a few days ago and there was some composer/syntax highlighting snafu th... [14:42:16] 10Traffic, 10Analytics, 06Operations: Add VSL error counters to Varnishkafka stats - https://phabricator.wikimedia.org/T164259#3227966 (10elukey) >>! In T164259#3227754, @Ottomata wrote: > Why not both!? :) Yes! I was concerned that the new field would have been a bit too much, but if we are ok with the new... [14:50:09] <_joe_> are we sending traffic to eqiad at the moment? [14:53:11] _joe_: esams caches are routed through eqiad again, yes. But eqiad is still depooled in DNS [14:53:26] <_joe_> why is that? [14:54:05] brandon mentioned that we planned to revert the route-through-codfw part yesterday I think? [14:56:09] yeah: Traffic: Pre-switchback in two phases: Mon May 1 and Tues May 2 (to avoid cold-cache issues Weds) [15:01:08] yeah today users switch back [15:01:20] (to the eqiad cache edge, which has no bearing on services / MW directly) [15:09:55] <_joe_> ok, cool [15:10:38] <_joe_> btw, I'm merging https://gerrit.wikimedia.org/r/351320 in a couple of minutes [15:11:52] <_joe_> which means reads from etcd will be nearest-master-dc [15:14:26] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3228091 (10Papaul) @Robh any update on this? [15:25:08] _joe_: thanks for the heads up [15:29:00] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3228127 (10RobH) Nope, I forgot about it! I'll knock them out now. [15:36:17] carrying on with varnish upgrades (misc now) [15:43:22] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3228138 (10RobH) row c done [15:44:42] _joe_: I haven't been following the etcd-cluster stuff, but I assume it's basically been working "right" throughout, in that etcd clients in any DC are getting correct data... [15:45:16] <_joe_> bblack: yes, only case in which that would break is if etcd replication breaks [15:45:19] <_joe_> that is paging now [15:45:54] <_joe_> for tonight, I guess you'll have to phone me in case of an emergency. I'll try to write docs in the next couple of days [16:04:45] 10netops, 10DBA, 06Operations, 10ops-codfw: db20[7-9][0-9] switch ports configuration - https://phabricator.wikimedia.org/T162944#3228212 (10RobH) 05Open>03Resolved row d done [16:48:20] ema, bblack: I'm about to test the switchover tasks, any conflict with things you might be doing in the next 30 min? [16:48:41] puppet will be disabled and re-enabled + run in eqiad/codfw hosts, nothing more [16:51:40] <_joe_> we're proceeding, we need to do a full test [16:51:43] <_joe_> so be advised [16:51:44] <_joe_> :) [16:53:00] ok [23:07:48] 10Traffic, 06Operations, 10ops-ulsfo: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3229950 (10RobH) [23:39:47] <_joe_> bblack: because of lack of coordination on my part, a change was merged tonight that made conftctl fail on the varnishes, as it wasn't updated. [23:40:13] <_joe_> so varnish-backend-restart failed on a handful of server today, you will find the emails in the spam