[03:11:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:43:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [07:49:21] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11458860 (10JAllemandou) In the meatime I'm going to reenable the indexation job with more resources, allowing it not to... [08:43:40] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [09:50:54] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11459188 (10cmooney) >>! In T412443#11458860, @JAllemandou wrote: > Either sampling more, or keeping one week of data fo... [09:58:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [10:23:45] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11459310 (10ayounsi) >>! In T412443#11459188, @cmooney wrote: >>>! In T412443#11458860, @JAllemandou wrote: >> Either sa... [13:33:48] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11460057 (10JAllemandou) I've been spending time on this, here's my latest finding: By reducing retention to 30 days, we... [13:48:28] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11460139 (10cmooney) >>! In T412443#11460057, @JAllemandou wrote: > I'll keep the conf as is (30 days retention), and wi... [15:14:59] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11460664 (10Jclark-ctr) p:05Triage→03Medium a:03Jclark-ctr [15:15:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11460667 (10Jclark-ctr) @cmooney i have moved all Scs cables from nokia switches to Juniper switches in rows c/d [15:15:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11460668 (10Jclark-ctr) a:05Jclark-ctr→03cmooney [15:22:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Decom Juniper EX/QFX switches in eqiad rows C/D - https://phabricator.wikimedia.org/T412271#11460722 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr Alll 4 links have been removed physically and deleted from netbox. Scs c... [15:23:18] Hey IF, we recently mirrored the upstream opensearch repos in T407123 . But I noticed that the upstream opensearch 2.x repo has a bunch of old versions, whereas our mirror only has the latest 2.x. Is it possible to backfill the old versions to our mirror? cc cwhite [15:23:19] T407123: Mirror OpenSearch repos from upstream - https://phabricator.wikimedia.org/T407123 [15:36:55] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11460805 (10ayounsi) p:05Triage→03Medium [15:37:34] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 13Patch-For-Review: Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11460806 (10ayounsi) p:05Triage→03Medium [15:44:53] 10Mail, 06Infrastructure-Foundations, 06SRE: wmf-auto-restart: fails for exim4 - https://phabricator.wikimedia.org/T330660#11460854 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff Current Exim packages ship a systemd unit, which resolved this. [15:47:11] 10Packaging, 06Infrastructure-Foundations: reprepro uploads should trigger rsync apt job - https://phabricator.wikimedia.org/T330843#11460863 (10LSobanski) p:05Medium→03Low [15:49:30] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops, 06SRE, 07Datacenter-Switchover: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11460869 (10LSobanski) @Clement_Goubert is this still needed? [15:50:25] 10SRE-tools, 06Infrastructure-Foundations: Create an offline cookbook to take care of additional offline steps - https://phabricator.wikimedia.org/T335431#11460871 (10LSobanski) p:05Medium→03Low [16:21:47] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733 (10cmooney) 03NEW p:05Triage→03Medium [16:22:00] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11461053 (10cmooney) [16:23:49] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11461064 (10cmooney) [16:23:55] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11461066 (10cmooney) [16:24:02] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11461068 (10cmooney) [16:24:18] 10netops, 06Infrastructure-Foundations, 06SRE: InboundInterfaceErrors alerts firing for Nokia switches on v25.10.1 - https://phabricator.wikimedia.org/T412733#11461070 (10cmooney) [16:39:43] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 13Patch-For-Review: Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11461141 (10JAllemandou) a:03JAllemandou [17:22:26] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops, 06SRE, 07Datacenter-Switchover: Support locking cookbooks run except for switchover related cookbooks - https://phabricator.wikimedia.org/T330997#11461316 (10Clement_Goubert) Not strictly, but it would be nice to have for peace of mind. This may be... [18:04:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: mr1-codfw: add second uplink to lsw1-a2-codfw - https://phabricator.wikimedia.org/T410717#11461432 (10Jhancock.wm) @cmooney i checked what we had in stock. We don't have that SFP. and could use a new 5 meter fiber for this task. [23:49:49] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#11462541 (10cmooney) This has come up again in terms of the pages we have been getting of late, and we may take some action to chan...