[08:13:04] <vgutierrez>	 !incidents
[08:13:04] <sirenbot>	 5693 (RESOLVED)  [2x] ProbeDown sre (dse-k8s-ctrl1002:6443 probes/custom eqiad)
[08:13:04] <sirenbot>	 5692 (RESOLVED)  [2x] ProbeDown sre (dse-k8s-ctrl1001:6443 probes/custom eqiad)
[09:24:09] <brouberol>	 Thanks elukey. stevemunene, would you mind having a look, as you were working on the control plane recently? Thanks!
[10:03:11] <elukey>	 There is also high latency for several dse workers https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DKubeletOperationalLatency
[10:03:15] <elukey>	 started 3 days ago
[10:10:02] <brouberol>	 I've seen these flap recently. How do we usually investigate these? Is it usually the control plane being loaded? The hosts themselves?
[10:19:48] <elukey>	 in this case it seems that run_podsandbox is taking ages and/or erroring out on various kubelets, I'd start checking the logs on the host to see if anything is happening
[10:27:10] <brouberol>	 I'm seeing 2 errors like these from yesterday
[10:27:10] <brouberol>	 Feb 23 11:34:21 dse-k8s-worker1002 kubelet[1624]: E0223 11:34:21.938179    1624 kuberuntime_gc.go:176] "Failed to stop sandbox before removing" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"339194cb9405e59ea2d22a5721605716ec50e22a9a62ae6a8c9f26d24c1c2e0c\": plugin type=\"calico\" failed (delete): error getting
[10:27:10] <brouberol>	 ClusterInformation: Get \"https://dse-k8s-ctrl.svc.eqiad.wmnet:6443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.2.2.73:6443: connect: connection refused" sandboxID="339194cb9405e59ea2d22a5721605716ec50e22a9a62ae6a8c9f26d24c1c2e0c"
[10:27:20] <brouberol>	 (on dse-k8s-worker1002)
[11:25:54] <elukey>	 as promised I published https://wikitech.wikimedia.org/wiki/Incidents/2025-02-17_maps (outage happened last Monday)
[11:28:01] <elukey>	 (I need to write another one for Wednesday, sigh)
[11:47:33] <kamila_>	 ty elukey <3
[12:40:55] <elukey>	 https://wikitech.wikimedia.org/wiki/Incidents/2025-02-19_maps
[12:40:59] <elukey>	 and second one published
[13:04:06] <Emperor>	 elukey: can I s/clearly visible/visible/ in the Timeline? I sounds a bit blame-elukey, and I think the notes below show that the issue wasn't clear, even if it was visible once you knew what you were looking for
[13:06:03] <elukey>	 Emperor: done! It wasn't the intent, my point is that the dashboard was there and I should've checked it :)
[13:06:18] <elukey>	 (also thanks for reviewing!)
[17:07:06] <elukey>	 hey on-callers, as FYI I pooled again all the wikikube workers for kartotherian.discovery.wmnet (backend for maps.wikimedia.org)
[17:07:25] <elukey>	 they are serving traffic with half the pybal weight (compared to the bare metals)
[17:07:32] <elukey>	 All info/state/rollback in https://phabricator.wikimedia.org/T386926
[17:07:51] <elukey>	 if anything goes south and I am not around, just depool as indicated in the task's description
[17:28:00] <swfrench-wmf>	 thanks, elukey!
[17:54:28] <mutante>	 ok
[18:18:06] <hnowlan>	 Hello! In anticipation of the upcoming DC switchover (T385155), we will be running a live test on Thursday February 27th, starting around 1700 UTC. Please let us know in #wikimedia-serviceops if this conflicts with any ongoing work or you have any concerns- all going well this will be a non-disruptive test
[18:18:07] <stashbot>	 T385155:  🧭 Northward Datacentre Switchover (March 2025)  - https://phabricator.wikimedia.org/T385155
[23:09:14] <Gabriel_wikipedi>	 First Wikipedia 3D colored models prototype
[23:09:14] <Gabriel_wikipedi>	 You can see it here:
[23:09:15] <Gabriel_wikipedi>	     http://wikipedia3d.serreriabelga.es/index.php/Special:ListFiles
[23:09:15] <Gabriel_wikipedi>	     http://wikipedia3d.serreriabelga.es/index.php/File:DamagedHelmet.glb
[23:09:16] <Gabriel_wikipedi>	     http://wikipedia3d.serreriabelga.es/index.php/File:Wikipedia.glb
[23:09:16] <Gabriel_wikipedi>	 A group of community members are working to improve 3D support in Wikipedia.
[23:09:17] <Gabriel_wikipedi>	 We would love to talk with members of the SRE team.
[23:09:17] <Gabriel_wikipedi>	 https://meta.wikimedia.org/wiki/Telegram -> https://t.me/+tMgoJxx8D7I5NTQ8