[08:41:30] 10netops, 10SRE, 10observability: Ingest Cron and Root Alerts Into Logstash - https://phabricator.wikimedia.org/T274377 (10ayounsi) Those servers don't have direct external connectivity, so we will have to be creative, eg.; * setup some kind of IMAP relay either with external connectivity of through the pro... [10:27:09] 10Traffic, 10Platform Engineering, 10cloud-services-team (Kanban): Get platform engineering team green light for Cloud NAT to wikis change - https://phabricator.wikimedia.org/T273738 (10daniel) [11:16:18] 10netops, 10SRE, 10SRE-tools, 10homer, and 2 others: Investigate Capirca - https://phabricator.wikimedia.org/T273865 (10ayounsi) Limitations identified: Some ACLs currently have Jinja code in them, which is not possible through Capirca. The easiest cases have (or can) be mitigated by either: * removing th... [11:32:44] 10Traffic, 10netops, 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage: Depool codfw swift cluster - https://phabricator.wikimedia.org/T267338 (10jcrespo) [11:33:21] 10Traffic, 10netops, 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage: Depool codfw swift cluster - https://phabricator.wikimedia.org/T267338 (10jcrespo) I asked filippo to delay the maintenance 1 week due to unexpected workload on my side, which would prevent me to be ready by next week. [12:00:50] 10Traffic, 10SRE, 10Wikimedia-General-or-Unknown: Disable caching on the main page for anonymous users - https://phabricator.wikimedia.org/T119366 (10Kaganer) There is a difference between logged in and unlogged sessions. See [[ https://ba.wikipedia.org/wiki/Баш_бит | https://ba.wikipedia.org/wiki/Баш_бит ]]... [12:27:54] 10Traffic, 10Platform Engineering, 10SRE, 10cloud-services-team (Kanban): Get platform engineering team green light for Cloud NAT to wikis change - https://phabricator.wikimedia.org/T273738 (10aborrero) For the record, we already have a dedicated phab task for the traffic team: {T273737} [13:28:16] 10netops, 10SRE: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10Volans) With the above patches merged, and with: `lang=bash root@install1003:/etc/dhcp# cat opt82-entries.ttyS1-115200 host sretest1002 { host-identifier option agent.circuit-id "asw2-d-eqiad:ge-6/0/5.0:private1-d-eqiad";... [13:41:14] 10netops, 10SRE: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10BBlack) I'm probably not up to date on concrete plans built on top of this, but it seems like having the numeric vlan id might be useful metadata here in addition to the abstract name of the vlan (e.g. scenarios where we might... [13:44:15] 10netops, 10SRE: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10ayounsi) From the doc: > Specify that the circuit ID suboption value contains the VLAN ID rather than the VLAN name (the default): > [edit vlans vlan-name forwarding-options dhcp-security option-82] > user@switch# set circu... [13:45:56] bblack: when you get a chance, re: i40e https://gerrit.wikimedia.org/r/c/operations/puppet/+/661053 thank you ! [13:55:02] 10netops, 10SRE: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10Volans) >>! In T221388#6822703, @BBlack wrote: > I'm probably not up to date on concrete plans built on top of this, but it seems like having the numeric vlan id might be useful metadata here in addition to the abstract name of... [14:30:56] 10Traffic, 10Platform Engineering, 10SRE, 10cloud-services-team (Kanban): Get platform engineering team green light for Cloud NAT to wikis change - https://phabricator.wikimedia.org/T273738 (10Ladsgroup) @daniel I think the most important part of the greenlight is if ratelimit in mediaiwki is going affect... [14:41:52] godog: there's also a pair of relevant patches from jbond42 I was hoping to try deploying today: [14:41:55] https://gerrit.wikimedia.org/r/c/operations/puppet/+/662688 [14:42:09] https://gerrit.wikimedia.org/r/c/operations/puppet/+/662699 [14:42:28] which moves that block of functionality and thus the driver whitelist over to the interface-rps script [14:43:07] in our case we're gonna have to be careful with how we roll that out, in case it blips LVS interfaces (so some puppet disables and seeing how the rest goes) [14:43:38] in your case, your patch or a variant built on the above could also blip interface traffic on your i40e ms-be hosts, if they're live in production, that might take some care as well [14:46:08] bblack: fyi i have an interview n 45 mins and just doing a bit of prep for that not. however i dont think yuo really need me to merge thoses changes [14:46:16] s/not/now/ [14:48:08] bblack: ack, thanks for the context, re: blipping interfaces I'm not worried on ms-be since that's impactless afaict for end clients [14:48:21] and/or staggered enough to be impactless in practice [14:49:54] not sure what makes more sense at this point, perhaps waiting for the two patches above [14:50:02] cc jbond42 ^ [14:52:23] godog: i can add the driver to https://gerrit.wikimedia.org/r/c/operations/puppet/+/662688/4/modules/interface/files/interface-rps.py#184 then it will get deployed when the above changes are merged [14:52:33] 10Wikimedia-Apache-configuration, 10SRE, 10Wikimedia-Site-requests, 10Patch-For-Review: Temporarily redirect sgs.wikipedia.org to bat-smg.wikipedia.org until bat-smg->sgs move can be done - https://phabricator.wikimedia.org/T204830 (10Base) Is there a blocker here? [14:54:46] https://gerrit.wikimedia.org/r/c/operations/puppet/+/662688/4..5/modules/interface/files/interface-rps.py diff is a bit more then its should as i have black enabled now [14:57:10] jbond42: nice! yeah I think adding i40e there should work as expected [17:24:50] 10Traffic, 10ops-codfw: codfw: lvs2007 : iDRAC is unable to communicate with power management firmware error - https://phabricator.wikimedia.org/T274571 (10Papaul) [17:25:19] 10Traffic, 10ops-codfw: codfw: lvs2007 : iDRAC is unable to communicate with power management firmware error - https://phabricator.wikimedia.org/T274571 (10Papaul) p:05Triage→03Medium [18:21:37] 10Traffic, 10SRE, 10ops-codfw: codfw: lvs2007 : iDRAC is unable to communicate with power management firmware error - https://phabricator.wikimedia.org/T274571 (10Papaul) 05Open→03Resolved This issue was resolved for now by draining the power . [21:16:32] 10Domains, 10Traffic, 10SRE: Apple Business Manager: verify ownership of wikimedia.org - https://phabricator.wikimedia.org/T274592 (10Peachey88) [21:16:53] 10Traffic, 10DNS, 10SRE: Apple Business Manager: verify ownership of wikimedia.org - https://phabricator.wikimedia.org/T274592 (10Reedy) [21:19:17] 10Traffic, 10DNS, 10SRE: Apple Business Manager: verify ownership of wikimedia.org - https://phabricator.wikimedia.org/T274592 (10Dzahn)