[07:02:18] 10Acme-chief, 06Traffic, 06Infrastructure-Foundations, 10Puppet-Infrastructure, and 2 others: Revert back to fleet-wide acmechief config once all ACME consumers are on Puppet 7 - https://phabricator.wikimedia.org/T365799#10120199 (10MoritzMuehlenhoff) a:05SLyngshede-WMF→03MoritzMuehlenhoff [07:13:42] moritzm: sorry about that -2 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049837, every acme-chief client is on puppet 7 already? [07:15:07] yes. unless someone re-introduced a new Puppet 5 acmeclient while I was off :-) [07:15:20] I'll post a cumin run on the gerrit patch in a bit [07:15:30] cool, so we need to move the active host to acmechief1002 [07:15:43] cause acmechief1001 is currently the host that performs issuance against Let's Encrypt API [07:15:57] if you get rid of that one, we won't get new certificates [07:16:00] we did already two months ago? [07:16:06] nope [07:16:19] unless there is something else that acmechief_host in common /hiera? [07:16:27] role/common/acme_chief.yaml:profile::acme_chief::active: acmechief1001.eqiad.wmnet [07:16:40] acmechief_host is used by acme-chief clients [07:16:54] ah, ok. I misse that part [07:16:58] I'll make a patch later [07:17:44] you'll have to update profile::acme_chief::active && profile::acme_chief::passive [07:18:10] I can take care of that migration [07:18:24] sure, that would be nice! [07:43:37] moritzm: CR is ready, I'll do it as soon as we got puppet back [07:52:19] ack, thx [08:25:01] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10120336 (10cmooney) >>! In T371434#10119784, @Papaul wrote: > The diagram below will outline the cabling of the new Fundraising n... [08:26:20] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw rack C1 from asw-c1-codfw to lsw1-c1-codfw - https://phabricator.wikimedia.org/T373095#10120339 (10cmooney) 05Open→03Resolved a:03cmooney [08:28:25] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120363 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumbering for host wikikube-w... [08:28:36] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120364 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2088.codfw.... [09:00:30] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120464 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from mw2434 to wik... [09:01:32] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120472 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin1002 from mw2435 to wik... [09:02:20] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120477 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:02:33] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120478 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:03:08] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120480 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:03:18] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120481 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:03:48] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120482 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:04:09] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120483 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [09:04:55] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120484 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by jayme@cumin1002 Renumberi... [09:05:11] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120485 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wiki... [09:12:54] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120495 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube... [09:19:57] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120516 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering f... [09:22:16] fabfur: hey how are things? [09:22:35] I think Sukhbir might have mentioned to you last week about some network maintenance we have on today? [09:22:56] Which will breifly disrupt cp2035 and cp2036? [09:23:08] he said you might be able to help depool them in advance of it [09:23:27] task is https://phabricator.wikimedia.org/T373096 [09:23:34] hosts will only be offline for about 10 seconds each [09:29:59] yes, I'll depool, silence and shut down these hosts around 1500UTC [09:31:47] it's ok for you ? [09:31:49] is shutdown really needed? [09:32:21] they're just moving network connections [09:45:44] I can just depool and silence but I recall suk.he suggesting to shut them down [09:54:38] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120668 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2089.codfw.wmne... [09:59:46] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120679 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [10:03:43] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120684 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by hnowlan@cumin1002 Renumbering for host wikikube... [10:03:56] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120685 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker2084.codf... [10:15:03] fabfur: I personally wouldn't shutdown them, but up to you, I'm OK either way [10:55:37] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10120867 (10MatthewVernon) Further Data Persistence nodes (Ceph / Swift) in `C2`: |`C2` | moss-be2003 | needs mainten... [10:57:54] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker2084.codfw.wm... [11:01:05] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120886 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by hnowlan@cumin1002 Renumbering for host wikikube-wor... [11:21:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120933 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2090.codfw.wmne... [11:22:12] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120940 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by jayme@cumin1002 Renumbering for host wikikube-worke... [11:25:22] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120948 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by cgoubert@cumin1002 Renumbering for host wikikub... [11:26:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10120950 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2029.cod... [12:54:46] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121332 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2029.codfw.w... [12:58:42] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121341 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by cgoubert@cumin1002 Renumbering for host wikikube-wo... [13:05:41] 06Traffic: Package and deploy ATS 9.2.5 - https://phabricator.wikimedia.org/T339134#10121377 (10Vgutierrez) 05Open→03Resolved a:03ssingh [13:19:27] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10121449 (10MatthewVernon) There are 4 swift servers in `C4` - ms-be2058 ms-be2064 ms-be2072 ms-be2077 ; they'll need... [13:24:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10121464 (10MatthewVernon) There are some impact Swift servers: - ms-be2054 and ms-be2078 and thanos-be2003 - these just need a quick c... [13:33:52] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10121496 (10MatthewVernon) These racks have the following Swift/Ceph nodes: - ms-fe2012 moss-fe2002 thanos-fe2003 (ne... [13:36:00] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate servers in codfw racks D5 & D6 from asw to lsw - https://phabricator.wikimedia.org/T373104#10121520 (10MatthewVernon) No affected swift/Ceph nodes in these racks. [13:36:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Migrate servers in codfw racks D3 & D4 from asw to lsw - https://phabricator.wikimedia.org/T373103#10121503 (10MatthewVernon) No Swift/Ceph nodes affected in this one. [13:40:05] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D7 & D8 from asw to lsw - https://phabricator.wikimedia.org/T373105#10121536 (10MatthewVernon) There are these impacted Swift/Ceph nodes: - thanos-be2004 ms-be2056 ms-be2059 ms-be2073 m... [13:40:41] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10121551 (10MatthewVernon) [13:40:57] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10121554 (10MatthewVernon) [13:41:14] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C4 & C5 from asw to lsw - https://phabricator.wikimedia.org/T373097#10121555 (10MatthewVernon) [13:41:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10121561 (10MatthewVernon) [14:35:12] vgutierrez: can I get you or someone else from traffic (I see su.khe is out) to comment on https://phabricator.wikimedia.org/T344171#10081275 re: recdns in prod? [14:36:50] 10netops, 06Traffic, 06Infrastructure-Foundations, 10probenet, 06SRE: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10121802 (10CDanis) [14:37:19] cdanis: su.khe is back on Monday, that works for you? I'm definitely out of my comfort zone here :) [14:37:41] 10netops, 06Traffic, 06Infrastructure-Foundations, 10probenet, 06SRE: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160#10121808 (10CDanis) [14:37:52] vgutierrez: ok sounds good! thanks [15:00:41] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121908 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by kamila@cumin1002 from mw2420 to wi... [15:01:27] topranks: I start with the depool/silence/shutdown of cp2035 and cp2036 [15:01:49] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121910 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by kamila@cumin1002 from mw2421 to wi... [15:03:33] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121913 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by kamila@cumin1002 Renumber... [15:04:04] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121914 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wik... [15:05:07] I think vgutierrez is right, similar interventions in the past didn't required shutdown, so I'll just depool and silence [15:06:28] yeah it actually is tricky for us if the hosts are offline as we can't test they are still reachable once the cable is moved [15:06:40] I'll downtime the whole rack before we start anyway so no need to do that specifically [15:06:47] thanks! [15:06:48] perfect, I'll downtime those for 3h [15:06:51] just to be sure [15:07:28] cool [15:07:36] ack [15:07:46] [don't forget to remove those afterwards] [15:07:56] yep [15:09:12] vgutierrez: o/ I'd need to restart acmechief on acmechief1002 to pick up py3.11 upgrades, is it safe to do? [15:09:58] elukey: I guess you need to restart both the API and acme-chief itself [15:10:51] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10121964 (10Fabfur) Hosts cp203[5-6] downtimed and depooled [15:11:13] vgutierrez: I think so yes [15:11:31] for the API you'd need to disable puppet on clients: sudo cumin 'R:acme_chief::cert' "disable-puppet 'acmechief maintenance - ${USER}'" [15:12:20] elukey: regarding acme-chief.service you can restart it now [15:13:44] API service name is uwsgi-acme-chief.service [15:13:51] exactly yes [15:14:28] done, I'll do API tomorrow morning :) [15:14:52] thanks! [15:15:23] cool, thx [15:17:25] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121992 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node was started by kamila@cumin1002 Renumber... [15:17:45] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10121994 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wik... [15:48:47] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122089 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8726666c-096a-491c-b6d3-edc93e2996f1) set... [15:51:00] 06Traffic: haproxykafka: feature: add ability to add/remove/modify fields - https://phabricator.wikimedia.org/T372339#10122092 (10Fabfur) 05Open→03Resolved [15:51:40] 06Traffic: haproxykafka: feature: Configuration file - https://phabricator.wikimedia.org/T372342#10122095 (10Fabfur) 05Open→03Resolved [15:58:05] 06Traffic: haproxykafka features - https://phabricator.wikimedia.org/T374128 (10Fabfur) 03NEW [16:31:56] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122267 (10MatthewVernon) @cmooney all good to go from a Swift/Ceph perspective, thanks for your patience [16:37:28] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122281 (10cmooney) >>! In T373096#10122267, @MatthewVernon wrote: > @cmooney all good to go from a Swift/Ceph perspe... [16:39:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122286 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cde90074-86b4-49ac-9878-436a5d041f2b) set... [16:49:41] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122317 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikub... [16:49:44] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122318 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by kamila@cumin1002 Renumbering... [16:51:21] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122337 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikub... [16:53:27] 06Traffic, 06Content-Transform-Team-WIP, 06Data-Persistence, 10iOS-app-feature-Performance, and 7 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365#10122347 (10daniel) [16:55:24] 10netops, 06Infrastructure-Foundations, 06serviceops, 06SRE, 13Patch-For-Review: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878#10122354 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.renumber-node started by kamila@cumin1002 Renumbering... [16:58:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122371 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=07e91a47-4c42-404a-bc7d-ad277bbf3e2b) set... [17:01:18] topranks: can I repool the hosts ? [17:01:36] not quite done yet - halfway through rack c3 [17:01:41] I'll ping you in a few if that's ok [17:01:59] fabfur: my bad, the cp hosts are in rack C2 [17:02:04] which we finished a few mins back and looks good [17:02:08] so yes feel free to re-pool [17:02:11] perfect! [17:02:13] thanks a lot! [17:02:23] np ty! [17:07:00] FIRING: PurgedHighEventLag: High event process lag with purged on cp2038:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2038 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [17:08:51] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122430 (10MatthewVernon) Swift / Ceph back to normal, thanks! [17:09:21] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122440 (10ABran-WMF) kudos @Jhancock.wm! d/p nodes are repooling [17:09:27] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122429 (10cmooney) All links moved and all hosts now responding to ping again. Average interruption in the region o... [17:12:00] FIRING: [2x] PurgedHighEventLag: High event process lag with purged on cp2038:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [18:08:16] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks C2 & C3 from asw to lsw - https://phabricator.wikimedia.org/T373096#10122695 (10cmooney) 05Open→03Resolved a:03cmooney [18:16:49] 06Traffic, 10conftool, 06Tech-Docs-Team, 07Documentation: Create new content structure for requestctl tool documentation - https://phabricator.wikimedia.org/T372095#10122719 (10TBurmeister) Status update: * Draft of user guide, tutorial, and command line reference is at https://wikitech.wikimedia.org/wiki/... [19:07:49] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Reverse DNS for k8s pods IPs - https://phabricator.wikimedia.org/T344171#10122931 (10CDanis) As it turns out this was discussed at the [[ https://www.mediawiki.org/wiki/Kubernetes_SIG | k8s SIG ]] but the bug didn't get updated. [[ https://www.medi... [20:18:22] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155 (10cmooney) 03NEW p:05Triage→03High [20:37:22] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123168 (10cmooney) Should mention there is a good case for shutting down PyBal on lvs1019 now, so that no traffic uses this bad link (instead ro... [20:53:25] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#10123234 (10BPirkle) > I was wondering if we have enough consensus to proceed in the path of having /api/ be rewritten in MediaWiki to /w/rest.php We do not have a consensus at this... [21:12:00] FIRING: [2x] PurgedHighEventLag: High event process lag with purged on cp2038:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [22:05:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Codfw row C/D switch installation & configuration - https://phabricator.wikimedia.org/T364095#10123484 (10cmooney) 05Open→03Resolved a:03cmooney [22:08:38] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure QoS marking and policy across network - https://phabricator.wikimedia.org/T339850#10123487 (10cmooney) 05Open→03Resolved [22:38:07] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10123557 (10Papaul) [22:40:41] brett: are you around by any chance? [22:50:17] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Reverse DNS for k8s pods IPs - https://phabricator.wikimedia.org/T344171#10123586 (10cmooney) >>! In T344171#10081275, @CDanis wrote: >> It might be quite naive (and does not solve my concern from above) but could we have a subnet delegation or a fo... [22:54:59] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123611 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=01519f9e-2903-4b5b-b71f-e25b1467cc00) set by cmooney@cumin1002 for 2:... [22:55:42] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123609 (10cmooney) @Jclark-ctr is on site so I will swing the live traffic to lvs1020 so we can investigate. [23:35:03] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176 (10Papaul) 03NEW [23:37:28] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10123837 (10Papaul) [23:38:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10123839 (10Papaul) [23:43:18] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: LInk errors from lvs1019 to ssw1-f1-eqiad - https://phabricator.wikimedia.org/T374155#10123855 (10cmooney) 05Open→03Resolved Ok so we replaced the optic on the lvs1019 side, and things seem to be good. Sent a test stream of... [23:52:26] topranks: I am now. What's up? [23:54:38] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Reverse DNS for k8s pods IPs - https://phabricator.wikimedia.org/T344171#10123888 (10cmooney) >>! In T344171#10122931, @CDanis wrote: > In summary Alex is looking at configuring Calico's IPAM to keep a set of coredns pods on static IP addresses, whi...