[00:00:59] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11033383 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm executed with...
[01:31:55] <wmcs-alerts>	 FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[02:05:00] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[02:05:08] <wikibugs>	 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T400432 (10phaultfinder) 03NEW
[02:41:55] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity
[03:05:00] <jinxer-wm>	 RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[03:14:20] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1172307 (owner: 10L10n-bot)
[03:14:36] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1172304 (owner: 10L10n-bot)
[03:15:05] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/1172305 (owner: 10L10n-bot)
[03:15:10] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1172303 (owner: 10L10n-bot)
[07:08:42] <wikibugs>	 06cloud-services-team, 10Data-Services: [wikireplicas] Views flaggedpage_pending and flaggedtemplates are broken - https://phabricator.wikimedia.org/T368939#11033687 (10Pppery) Anything left to do here?
[07:10:51] <wikibugs>	 06cloud-services-team, 10Data-Services: Denormalize user_groups to contain actor information - https://phabricator.wikimedia.org/T238497#11033695 (10Pppery)
[09:47:21] <wikibugs>	 (03update) 10vriaa: Draft: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1
[09:56:38] <wikibugs>	 06cloud-services-team, 10Data-Services: [wikireplicas] Automatically check for missing tables - https://phabricator.wikimedia.org/T378470#11034074 (10fnegri) p:05Medium→03Low I think I would still like to have a list of "partially public" tables that are missing in the replicas. But now that the public tab...
[10:14:21] <wikibugs>	 06cloud-services-team, 10Data-Services: [wikireplicas] Views flaggedpage_pending and flaggedtemplates are broken - https://phabricator.wikimedia.org/T368939#11034170 (10fnegri) 05Open→03Resolved a:03fnegri @Pppery sorry, this task slipped through the cracks. We no longer need to remove those tables f...
[14:03:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[14:12:37] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11034774 (10Andrew) I broke the cluster again, but now it's working. The main thing I did was a version...
[14:13:27] <wikibugs>	 10Tool-inteGraality: Retrieving labels via SPARQL tanks query performance - https://phabricator.wikimedia.org/T400480 (10JeanFred) 03NEW
[14:14:52] <wikibugs>	 10Tool-inteGraality: Retrieving labels via SPARQL tanks query performance - https://phabricator.wikimedia.org/T400480#11034801 (10JeanFred) One potential idea: using subqueries: ` SELECT ?grouping ?higher_grouping ?grouping_link_value (COUNT(DISTINCT ?entity) as ?count)  WITH {   SELECT ?grouping (SAMPLE(?_highe...
[14:33:43] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of SimpleProject VPS project - https://phabricator.wikimedia.org/T400482 (100000abcd1234) 03NEW
[14:36:30] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Neutron metadata service failing for all VMs - https://phabricator.wikimedia.org/T395742#11034860 (10Andrew) The fix for T395255 did not resolve the intermittent crashes here.
[14:47:16] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of SimpleProject VPS project - https://phabricator.wikimedia.org/T400482#11034880 (10Aklapper) 05Open→03Declined a:050000abcd1234→03None Hi, the purpose is too broad and the project name is vague. We generally do not grant Cloud VPS projects for single...
[15:03:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-26 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[15:25:16] <wikibugs>	 10PAWS: /home/paws is 100% - https://phabricator.wikimedia.org/T396051#11035053 (10Andrew) I now have lists of large home directories (more than 1G usage total) that have no date stamps after 2021. Is there any reason to not just delete all of those?
[15:28:33] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-26 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[15:34:29] <wmcs-alerts>	 RESOLVED: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.68%) at host paws-nfs-1 in project paws   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull
[15:43:33] <wmcs-alerts>	 FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-26 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[16:18:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[17:50:29] <wikibugs>	 10Tool-globalcontribution: Check if timeout error for bulk requests - https://phabricator.wikimedia.org/T382658#11035537 (10Gnoeee) 05Open→03Resolved
[18:38:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[18:39:04] <wikibugs>	 06cloud-services-team, 10Toolforge: tools-login intermittently has broken networking? - https://phabricator.wikimedia.org/T400502 (10DamianZaremba) 03NEW
[18:40:38] <wikibugs>	 06cloud-services-team, 10Toolforge: tools-login intermittently has broken networking? - https://phabricator.wikimedia.org/T400502#11035728 (10DamianZaremba) And here is a traceroute when working ` traceroute to login.tools.wmflabs.org (185.15.56.57), 64 hops max, 40 byte packets  1  172.16.0.254 (172.16.0.254)...
[19:12:32] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11035837 (10VRiley-WMF) While attempting to image this server (clouddb1022) and got this error. {F65673966}
[19:23:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[19:50:46] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#11035924 (10thcipriani) >>! In T386416#11027704, @taavi wrote: > This doesn't seem to have ever worked; the notification email...
[20:01:41] <wikibugs>	 (03update) 10vriaa: Draft: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1
[20:03:47] <wikibugs>	 06cloud-services-team, 10Data-Services: Denormalize user_groups to contain actor information - https://phabricator.wikimedia.org/T238497#11035959 (10Bugreporter) Such change is meaningless as long as copy in cloud replica are just views instead of real copy (or materialized views) - so any queries on such "den...
[20:05:55] <wikibugs>	 (03update) 10vriaa: Draft: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1
[20:08:29] <wikibugs>	 06cloud-services-team, 10Toolforge: tools-login intermittently has broken networking? - https://phabricator.wikimedia.org/T400502#11035964 (10DamianZaremba) Hung again in the middle of typing ` traceroute to login.tools.wmflabs.org (185.15.56.57), 64 hops max, 40 byte packets  1  172.16.0.254 (172.16.0.254)  6...
[20:55:16] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Block web crawlers from accessing Cloud Services - https://phabricator.wikimedia.org/T226688#11036017 (10MusikAnimal) I agree robots.txt is useless. I have had everything blocked for years and it doesn't stop anything: https://xtools.wmcloud.org/robots.txt  I...
[21:18:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[21:25:29] <wmcs-alerts>	 FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-2 in project toolsbeta   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[21:28:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[22:13:33] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess