[02:10:10] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (90.39%) on ganeti2019:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [04:18:55] FIRING: MaxConntrack: Max conntrack at 85.15% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [04:23:55] RESOLVED: MaxConntrack: Max conntrack at 83.57% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [06:10:10] FIRING: GanetiMemoryPressure: Ganeti: High memory usage (90.47%) on ganeti2019:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [08:15:22] the alert for ganeti2019 should vanish in a bit [08:20:10] RESOLVED: GanetiMemoryPressure: Ganeti: High memory usage (90.44%) on ganeti2019:9100 - https://wikitech.wikimedia.org/wiki/Ganeti#Memory_pressure - https://grafana.wikimedia.org/d/gd6vep5Iz/ganeti-memory-pressure?orgId=1&var-site=codfw - https://alerts.wikimedia.org/?q=alertname%3DGanetiMemoryPressure [08:20:27] as promised :D [09:13:18] hey folks! [09:13:44] Finally it seems that the Kartotherian k8s image runs as the bare metal version on maps* [09:13:52] now I need to tune cpu requirements etc.. [09:14:23] but Yiannis confirmed that the diff between k8s and "prod" bare metal is almost nothing [09:14:40] (we are upgrading node + mapnik base libs so a tiny change is expected) [09:19:11] \o/ [09:19:40] we should also have the first bookworm maps master running later the day [09:22:57] niceee [09:36:12] nice work guys!! [10:40:11] oooh [12:55:47] ffs.... look at how Nokia have mapped the Trident 3 port-blocks on the switches we are trialling (second image): [12:55:55] https://documentation.nokia.com/outlook_jira_test/books/interfaces/displaying_port_group_members_7220_ixr_d2_and_7220_ixr_d2l_only.html [12:56:16] I'm sure some genius hardware person had a good reason but god damn [14:39:05] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Check link from msw1-eqiad et-0/1/0 to msw2-eqiad et-0/1/0 - https://phabricator.wikimedia.org/T384708#10508615 (10Papaul) 05Open→03Resolved a:03Papaul We are not seeing any errors for the last 24 hours resolving this task fo... [16:10:03] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 06SRE: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802#10509243 (10cmooney) 05Open→03Resolved a:03cmooney [16:49:10] 10netops, 06Infrastructure-Foundations, 06SRE: Manage fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10509343 (10cmooney) [16:51:37] 10netops, 06Infrastructure-Foundations, 06SRE: Manage fundraising network elements from Netbox - https://phabricator.wikimedia.org/T377996#10509388 (10cmooney) 05Open→03Resolved This is now largely complete. We have decided to model the switch<->server links in Netbox (with dummy names 'PRIMARY_A' a... [17:52:07] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509771 (10cmooney) [17:52:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509779 (10cmooney) [17:53:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509783 (10cmooney) [17:56:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509812 (10cmooney) [17:56:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509815 (10cmooney) [17:58:51] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10509830 (10cmooney) [18:59:27] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10510080 (10Jhancock.wm) I have the two Dell Poweredge R 440 servers set aside when we are ready to rack them. they have 10G car... [19:00:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Install and cable Nokia test devices and test servers in codfw - https://phabricator.wikimedia.org/T385217#10510086 (10Jhancock.wm) also forgot to mention we have one spare SFP-100G-LR4 we can test with [19:02:55] FIRING: MaxConntrack: Max conntrack at 82.1% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [19:07:55] RESOLVED: MaxConntrack: Max conntrack at 82.1% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:14:55] FIRING: MaxConntrack: Max conntrack at 80.21% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:19:55] RESOLVED: MaxConntrack: Max conntrack at 80.21% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [20:57:55] FIRING: MaxConntrack: Max conntrack at 81.41% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [21:02:56] RESOLVED: MaxConntrack: Max conntrack at 81.3% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [22:45:06] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#10510616 (10cmooney) @fgiunchedi perhaps you might know a way to do this. We now have stats like this...