[01:35:49] FIRING: PuppetZeroResources: Puppet has failed generate resources on cp1108:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:39:49] FIRING: PuppetFailure: Puppet has failed on cp1112:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:40:49] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cp1108:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:45:01] FIRING: PuppetFailure: Puppet has failed on cp1101:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:45:22] FIRING: PuppetFailure: Puppet has failed on lvs1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:50:01] RESOLVED: PuppetFailure: Puppet has failed on lvs1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:51:08] FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on cp1106:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:52:01] FIRING: PuppetFailure: Puppet has failed on durum1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:00:05] FIRING: [3x] PuppetFailure: Puppet has failed on cp1102:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:00:17] FIRING: [2x] PuppetFailure: Puppet has failed on cp1101:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:00:26] FIRING: PuppetFailure: Puppet has failed on dns1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:01:33] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cp1106:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [02:04:49] FIRING: [3x] PuppetFailure: Puppet has failed on cp1102:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:09:49] FIRING: [3x] PuppetFailure: Puppet has failed on cp1102:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:10:05] RESOLVED: [2x] PuppetFailure: Puppet has failed on cp1101:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:17:06] RESOLVED: PuppetFailure: Puppet has failed on durum1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:19:48] RESOLVED: [3x] PuppetFailure: Puppet has failed on cp1102:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:20:01] RESOLVED: PuppetFailure: Puppet has failed on dns1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:21:41] 06Traffic, 06Data-Platform, 10Data Products (Data Products Sprint 19): NEW BUG REPORT - Issues in calculation logic for unique devices tables - https://phabricator.wikimedia.org/T375527#10181958 (10OSefu-WMF) [04:21:42] 06Traffic, 06Movement-Insights: Investigating unique devices traffic data - https://phabricator.wikimedia.org/T375562#10181959 (10OSefu-WMF) [04:23:07] 06Traffic, 06Movement-Insights: Investigating unique devices traffic data - https://phabricator.wikimedia.org/T375562#10181965 (10OSefu-WMF) p:05Triage→03High [07:53:30] 10netops, 06Infrastructure-Foundations, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845 (10akosiaris) 03NEW [08:21:07] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847 (10aborrero) 03NEW [08:22:38] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182170 (10aborrero) 05Open→03In progress p:05Triage→03Medium [08:23:41] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182174 (10aborrero) [08:28:55] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182196 (10aborrero) [08:34:48] 10netops, 06Infrastructure-Foundations, 06SRE: netbox: create IPv6 entries for Cloud VPS - https://phabricator.wikimedia.org/T374712#10182201 (10aborrero) >>! In T374712#10163608, @cmooney wrote: > @arturo /64s for VM usage I guess can be allocated from [[ https://netbox.wikimedia.org/ipam/prefixes/1079/pref... [09:38:39] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10182322 (10JMeybohm) > Note that per past experience, changing the ippool is a arduous and dangerous process. We probably don... [09:38:47] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10182331 (10JMeybohm) [10:04:53] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182409 (10aborrero) new instance creation will allocate an IPv6 by default for a VM: {F57561880} [10:14:51] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182457 (10aborrero) however, instance creation itself failed: ` 2024-09-27 10:07:20.088 451966 ERROR nova.compute.manager [N... [10:54:24] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182562 (10aborrero) neutron virtual router has the right IPv6 address: {F57561976} [11:05:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [11:10:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [11:10:33] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06serviceops: WikiKube clusters close to exhausting Calico IPPool allocations - https://phabricator.wikimedia.org/T375845#10182615 (10akosiaris) >>! In T375845#10182322, @JMeybohm wrote: >> Note that per past experience, changing the ippool is a ar... [11:11:11] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182616 (10aborrero) The VM did not get the IPv6 assigned in the interface via dhcpv6 :-( `lang=shell-session aborrero@ipv6:~$ ip -br a lo... [11:12:02] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182629 (10aborrero) We got DNS integration half working: `lang=shell-session aborrero@ipv6:~$ host ipv6.cloudinfra-codfw1dev.codfw1dev.wikimedia.cl... [11:22:01] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10182667 (10aborrero) I see the dhcp6 packets from my test VM arriving into neutron: ` 11:20:29.156995 IP6 fe80::f816:3eff:fe3e:4b38.546 > ff02::1:2.... [12:03:25] 06Traffic, 06collaboration-services, 06SRE, 13Patch-For-Review, 10Release-Engineering-Team (Radar): implement anti-abuse features for GitLab (Move GitLab behind the CDN) - https://phabricator.wikimedia.org/T366882#10182766 (10Jelto) I added a summary of the rate limiting and abuse tooling (including nfta... [12:13:47] 06Traffic, 10Infrastructure Security, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, 07Security: Integrate In-App Internet censorship circumvention by domain fronting - https://phabricator.wikimedia.org/T327286#10182809 (10Naruse_shiroha) I've already implemented it for Android. https://gi... [12:28:16] 06Traffic, 10Infrastructure Security, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, 07Security: Integrate In-App Internet censorship circumvention by domain fronting - https://phabricator.wikimedia.org/T327286#10182942 (10Dbrant) [13:30:03] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: cp2037 hardware issues: A fatal error was detected on a component at bus 174 device 0 function 0 - https://phabricator.wikimedia.org/T375766#10183208 (10ssingh) 05Open→03Resolved Host has been pooled and no issues since, thanks for the quick turnaround @Jha... [14:00:07] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:rack/install/configuration new firewalls - https://phabricator.wikimedia.org/T374176#10183344 (10Papaul) Firewalls are ready, the only thing left is to setup the SSL certificate. I will working with @Jgreen... [15:08:44] 06Traffic, 06Data-Platform, 10Data Products (Data Products Sprint 20 🎯): NEW BUG REPORT - Issues in calculation logic for unique devices tables - https://phabricator.wikimedia.org/T375527#10183584 (10VirginiaPoundstone) [22:28:04] 06Traffic, 06Data-Platform, 10Data Products (Data Products Sprint 20 🎯): NEW BUG REPORT - Issues in calculation logic for unique devices tables - https://phabricator.wikimedia.org/T375527#10184977 (10OSefu-WMF)