[00:01:15] 10Tool-paulina, 10Outreachy (Round 31): Outreachy 31: Features to edit author and work data on Wikidata directly from Paulina - https://phabricator.wikimedia.org/T392429#11494308 (10Nurah_Wakili) Weekly Internship Report Week 4: December 22 – December 27 Task 1: Conducted research on making authenticated req... [00:03:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 80.43% full for project tools-logging - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [00:05:12] 10Tool-paulina, 10Outreachy (Round 31): Outreachy 31: Features to edit author and work data on Wikidata directly from Paulina - https://phabricator.wikimedia.org/T392429#11494321 (10Nurah_Wakili) Weekly Internship Report Week 4: December 30 – January 5 Task 1: No development tasks were completed this week du... [00:14:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:48:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:54:56] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [00:55:01] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [00:56:37] !log andrew@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=97) [00:57:12] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11494493 (10HMonroy) @MusikAnimal looking at this task and noticed your last comment. Did you get around this? I do see data in PageViews from 2025 :) [00:57:44] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [01:03:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-etcd-27 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:05:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [01:08:48] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [01:13:37] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:13:41] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:13:45] !log andrew@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=97) [01:13:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:14:33] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:15:08] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:18:15] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:22:23] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:22:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [01:22:58] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:23:02] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:23:28] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-test-k8s-etcd-27 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:26:28] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:28:14] FIRING: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-10.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [01:28:53] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:28:58] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:29:26] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:32:45] FIRING: Toolforge Kyverno no policy resources: Toolforge Kyverno has no policy resources - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/Toolforge_Kyverno_no_policy_resources - https://grafana.wmcloud.org/d/kyverno/kyverno?orgId=1&var-DS_PROMETHEUS_KYVERNO=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforge+Kyverno+no+policy+resources [01:32:46] FIRING: Toolforge Kyverno unknown state: Toolforge Kyverno has unknown state. Kyverno might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/Toolforge_Kyverno_unknown_state - https://grafana.wmcloud.org/d/kyverno/kyverno?orgId=1&var-DS_PROMETHEUS_KYVERNO=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforge+Kyverno+unknown+state [01:33:20] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:33:53] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:36:19] FIRING: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown [01:36:25] FIRING: JobsApiDown: JobsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsApiDown [01:36:58] FIRING: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown [01:37:11] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:37:17] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:37:18] FIRING: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown [01:37:21] FIRING: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [01:37:23] FIRING: ToolforgeKubernetesNodeNotReady: (no data) Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [01:37:26] FIRING: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown [01:37:34] FIRING: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown [01:37:34] FIRING: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown [01:42:43] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:43:49] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [01:43:53] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:46:19] RESOLVED: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown [01:46:25] RESOLVED: JobsApiDown: JobsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsApiDown [01:47:21] RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [01:47:26] RESOLVED: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown [01:47:34] RESOLVED: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown [01:47:45] RESOLVED: Toolforge Kyverno no policy resources: Toolforge Kyverno has no policy resources - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/Toolforge_Kyverno_no_policy_resources - https://grafana.wmcloud.org/d/kyverno/kyverno?orgId=1&var-DS_PROMETHEUS_KYVERNO=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforge+Kyverno+no+policy+resources [01:47:46] RESOLVED: Toolforge Kyverno unknown state: Toolforge Kyverno has unknown state. Kyverno might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/Toolforge_Kyverno_unknown_state - https://grafana.wmcloud.org/d/kyverno/kyverno?orgId=1&var-DS_PROMETHEUS_KYVERNO=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforge+Kyverno+unknown+state [01:48:14] RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-10.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDo [01:48:50] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [01:48:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:51:58] RESOLVED: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown [01:52:18] RESOLVED: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown [01:52:23] RESOLVED: ToolforgeKubernetesNodeNotReady: (no data) Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [01:52:34] RESOLVED: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown [01:52:54] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [01:52:58] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [01:59:49] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:00:14] FIRING: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [02:00:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [02:04:00] FIRING: [4x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:05:14] RESOLVED: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [02:05:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [02:07:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [02:10:22] FIRING: [9x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:11:25] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11494553 (10MusikAnimal) >>! In T413393#11494493, @HMonroy wrote: > @MusikAnimal looking at this task and noticed your last comment. Did you get around this? I do see data in PageViews from... [02:11:56] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [02:12:02] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [02:13:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:15:22] RESOLVED: [9x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:16:11] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [02:16:32] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [02:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:19:00] FIRING: [5x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:19:58] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [02:20:48] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [02:20:52] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [02:21:33] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11494557 (10MusikAnimal) [02:22:21] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [02:24:00] FIRING: [6x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:31:58] FIRING: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [02:35:20] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [02:39:45] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [02:39:49] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [02:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:45:41] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [02:47:07] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [02:47:11] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [02:53:18] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:57:07] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [02:58:10] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [03:08:41] FIRING: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:14:32] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [03:16:56] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:19:00] FIRING: [7x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:20:16] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [03:20:22] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [03:21:56] FIRING: [2x] SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:28:44] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [03:31:56] FIRING: [2x] SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:34:27] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11494622 (10MusikAnimal) [03:36:56] RESOLVED: [2x] SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:36:58] RESOLVED: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [03:44:22] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.400 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [03:47:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [03:47:33] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [04:13:11] RESOLVED: CloudVPSDesignateLeaks: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:22:44] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11494644 (10MusikAnimal) 05In progressβ†’03Resolved Annnnd the 2025 results are live! πŸŽ‰ https://pageviews.wmcloud.org/topviews/?date=2025 I was told to ping @LDickinsonWMF and Aiman J... [04:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:09:08] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11494666 (10Muguro) **Weekly Internship Report** //Week 4: December 29 - January 2// **Overview of Tasks Completed:** Task 1: Update [[ https://phab... [05:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:14:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:55:37] 10Tool-wsindex, 10Wikisource Reader App: Display extra information from Wikidata in Book Details Screen - https://phabricator.wikimedia.org/T406245#11494952 (10Saiphani02) If any value is unknown, we should not display that line. example, it should not say, "Publisher: Unknown Publisher" change label from "pla... [10:25:24] (03PS1) 10Majavah: toolforge: k8s: prepare_upgrade: Check that functional tests pass [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223633 [10:25:24] (03PS1) 10Majavah: toolforge: k8s: prepare_upgrade: Automatically set downtime [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223634 [11:04:34] 10Tool-wsindex, 10Wikisource Reader App: Add an option to find new books added to the app - https://phabricator.wikimedia.org/T410597#11495181 (10Bodhisattwa) [11:04:40] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495182 (10Bodhisattwa) [11:05:13] 10Tool-wsindex, 10Wikisource Reader App: Add an option to find new books added to the app - https://phabricator.wikimedia.org/T410597#11495184 (10Bodhisattwa) [11:05:17] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495185 (10Bodhisattwa) [11:05:29] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495188 (10Bodhisattwa) [11:06:33] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495190 (10Bodhisattwa) [11:12:35] 10Tool-wsindex, 10Wikisource Reader App: Some basic usage analytics from the API - https://phabricator.wikimedia.org/T413864 (10Saiphani02) 03NEW [11:13:16] 10Tool-wsindex, 10Wikisource Reader App: Some basic usage analytics from the API - https://phabricator.wikimedia.org/T413864#11495225 (10Bodhisattwa) [11:13:22] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495226 (10Bodhisattwa) [11:17:17] 10Tool-wsindex, 10Wikisource Reader App: Some basic usage analytics from the API - https://phabricator.wikimedia.org/T413864#11495235 (10Bodhisattwa) [11:18:56] 10Tool-wsindex, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11495259 (10Bodhisattwa) [11:24:37] 10Tool-wsindex, 10Wikisource Reader App: Books update option in local library - https://phabricator.wikimedia.org/T408285#11495311 (10Saiphani02) Let us add a refresh type icon for each book in the Library. When the user clicks on it, they should first see a warning that any existing notes, highlights, and pro... [11:25:07] 10Tool-wsindex, 10Wikisource Reader App: Books update option in local library - https://phabricator.wikimedia.org/T408285#11495312 (10Bodhisattwa) a:03Muguro [12:07:23] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade to 1.31.14 (T413796) [12:07:28] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [12:22:14] !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=99) for cluster toolsbeta upgrade to 1.31.14 (T413796) [12:22:20] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [12:25:14] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade to 1.31.14 (T413796) [12:26:23] 06cloud-services-team, 10Toolforge: toolforge jobs logs: requests.exceptions.HTTPError: 400 Client Error: Bad Request for url - https://phabricator.wikimedia.org/T413874 (10taavi) 03NEW p:05Triageβ†’03High [12:34:06] PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:40:04] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster toolsbeta upgrade to 1.31.14 (T413796) [12:40:10] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [12:41:29] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-10 from 1.30.14 to 1.31.14 (T413796) [12:49:54] taavi@cloudcumin1001 upgrade (PID 1086757) is awaiting input [12:56:57] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-10 from 1.30.14 to 1.31.14 (T413796) [12:57:02] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [12:57:05] (03PS1) 10Majavah: toolforge: k8s: Fix check for first node to be upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223652 [12:57:06] (03PS1) 10Majavah: toolforge: k8s: Remind user about hostname being upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223653 [12:57:17] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.30.14 to 1.31.14 (T413796) [12:58:01] !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-control-11 from 1.30.14 to 1.31.14 (T413796) [12:59:16] (03PS2) 10Majavah: toolforge: k8s: Fix check for first node to be upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223652 [12:59:16] (03PS2) 10Majavah: toolforge: k8s: Remind user about hostname being upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223653 [12:59:21] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-11 from 1.30.14 to 1.31.14 (T413796) [13:01:29] !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-control-11 from 1.30.14 to 1.31.14 (T413796) [13:02:52] (03CR) 10CI reject: [V:04-1] toolforge: k8s: Remind user about hostname being upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223653 (owner: 10Majavah) [13:03:45] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-12 from 1.30.14 to 1.31.14 (T413796) [13:03:50] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [13:04:13] (03PS3) 10Majavah: toolforge: k8s: Remind user about hostname being upgraded [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223653 [13:05:56] (03PS1) 10Majavah: toolforge: k8s: Retry other errors when polling for version upgrade [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223656 [13:06:06] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Maintain-dbusers is having sustained errors - https://phabricator.wikimedia.org/T413558#11495631 (10Andrew) 05Openβ†’03Resolved [13:08:31] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-12 from 1.30.14 to 1.31.14 (T413796) [13:09:17] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.restart_static_pods for toolsbeta-test-k8s-control-11 (T413796) [13:09:22] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [13:11:54] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.restart_static_pods (exit_code=0) for toolsbeta-test-k8s-control-11 (T413796) [13:12:04] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.restart_static_pods for toolsbeta-test-k8s-control-10 (T413796) [13:14:40] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.restart_static_pods (exit_code=0) for toolsbeta-test-k8s-control-10 (T413796) [13:14:45] T413796: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 [13:17:02] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for toolsbeta-test-k8s-worker-nfs-10, toolsbeta-test-k8s-worker-nfs-11, toolsbeta-test-k8s-worker-nfs-7, toolsbeta-test-k8s-worker-nfs-8, toolsbeta-test-k8s-worker-nfs-9, toolsbeta-test-k8s-worker-12, toolsbeta-test-k8s-worker-13 [13:19:12] (03PS1) 10Majavah: toolforge: k8s: worker: Fix cookbook path comments [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223657 [13:19:12] (03PS1) 10Majavah: toolforge: k8s: worker: Remove special handling for SGE bastion [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223658 [13:24:27] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for toolsbeta-test-k8s-worker-nfs-10, toolsbeta-test-k8s-worker-nfs-11, toolsbeta-test-k8s-worker-nfs-7, toolsbeta-test-k8s-worker-nfs-8, toolsbeta-test-k8s-worker-nfs-9, toolsbeta-test-k8s-worker-12, toolsbeta-test-k8s-worker-13 [13:25:30] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for toolsbeta-test-k8s-ingress-11, toolsbeta-test-k8s-ingress-12, toolsbeta-test-k8s-ingress-9 [13:28:05] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for toolsbeta-test-k8s-ingress-11, toolsbeta-test-k8s-ingress-12, toolsbeta-test-k8s-ingress-9 [13:28:46] 10VPS-project-Codesearch, 10phan-taint-check-plugin: phan-taint-check-plugin / SecurityCheckPlugin is indexed by Codesearch twice - https://phabricator.wikimedia.org/T413879 (10A_smart_kitten) 03NEW [13:29:40] (03CR) 10A smart kitten: "I think this may have resulted in phan-taint-check-plugin being indexed twice by Codesearch (once under the `libs` group, once under the `" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/965841 (owner: 10GergΕ‘ Tisza) [13:33:00] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for toolsbeta-bastion-7.toolsbeta.eqiad1.wikimedia.cloud, toolsbeta-bastion-6.toolsbeta.eqiad1.wikimedia.cloud [13:33:19] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for toolsbeta-bastion-7.toolsbeta.eqiad1.wikimedia.cloud, toolsbeta-bastion-6.toolsbeta.eqiad1.wikimedia.cloud [13:34:10] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.run_tests [13:47:45] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [13:47:50] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [13:49:20] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) [13:49:45] 06cloud-services-team, 10Toolforge: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796#11495929 (10taavi) 05Openβ†’03Resolved [13:50:53] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.31 - https://phabricator.wikimedia.org/T372697#11495941 (10taavi) [13:52:58] FIRING: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [13:54:15] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [13:54:43] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node (T375217) [13:54:48] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [13:59:22] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 490 bytes in 0.014 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [14:00:42] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [14:04:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:13:43] 06cloud-services-team, 10Wikimedia-Mailing-lists: aborrero@wikimedia.org still subscribed to ops@lists.wikimedia.org - https://phabricator.wikimedia.org/T413883 (10Andrew) 03NEW [14:15:23] 06cloud-services-team, 10Wikimedia-Mailing-lists: aborrero@wikimedia.org still subscribed to ops@lists.wikimedia.org - https://phabricator.wikimedia.org/T413883#11496106 (10Ladsgroup) Mailman has automatic unsub if the emails bounce too many times I'd assume since ops@ is not sending that many emails or the bo... [14:18:36] (03update) 10raymond-ndibe: images: resolve the image every time [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/251 (owner: 10dcaro) [14:36:11] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirtlocal1003'] [14:36:58] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:37:00] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirtlocal1003'] [14:37:58] RESOLVED: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [14:39:44] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [14:39:48] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [14:53:25] 06cloud-services-team, 06SRE, 10Wikimedia-Mailing-lists: aborrero@wikimedia.org still subscribed to ops@lists.wikimedia.org - https://phabricator.wikimedia.org/T413883#11496217 (10Ladsgroup) I'm not seeing the email address in ops list. Maybe someone removed it in the mean time. [14:56:29] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [14:57:34] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [14:57:38] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [15:14:14] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [15:17:35] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#11496297 (10taavi) >>! In T376400#11345384, @Andrew wrote: >> Sure, for example the first image on http://ec2-54-81-201-239.compute-1.amazonaws.com/... [15:19:22] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.366 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [15:57:36] 06cloud-services-team, 10PAWS, 10MediaWiki-extensions-OAuth, 10Wikidata, 06MediaWiki-Platform-Team (Radar): Add PAWS edit tag for Wikidata sitelink auto-change - https://phabricator.wikimedia.org/T263347#11496416 (10Tgr) [16:00:38] 06cloud-services-team, 10Data-Services, 10MediaWiki-extensions-OAuth, 06Security-Team, and 2 others: (partially) expose oauth_registered_consumer table - https://phabricator.wikimedia.org/T247800#11496424 (10Tgr) [16:23:46] 06cloud-services-team, 10Cloud-VPS: [tofu-infra] "tofu plan" failing in codfw - https://phabricator.wikimedia.org/T410265#11496572 (10Andrew) >>! In T410265#11411496, @Andrew wrote: > This is probably unrelated, but The fact that eqiad1 is now running all the same versions without issue makes it even less lik... [16:40:21] 10VPS-Projects, 06Security-Team, 10WM-Bot, 07SecTeam-Processed, and 2 others: https://wm-bot.wmcloud.org/github/index.php seems vulnerable to SQL injection - https://phabricator.wikimedia.org/T408876#11496688 (10sbassett) [16:50:50] 06cloud-services-team, 10Cloud-VPS: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217#11496826 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudnet1006.eqiad.wmnet with OS trixie completed: - cloudnet1006 (**WARN*... [17:22:10] PROBLEM - Memcached on cloudcontrol1007 is CRITICAL: connect to address 10.64.148.21 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [17:22:10] PROBLEM - Memcached on cloudcontrol1011 is CRITICAL: connect to address 10.64.151.8 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [17:23:10] RECOVERY - Memcached on cloudcontrol1007 is OK: TCP OK - 0.000 second response time on 10.64.148.21 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [17:23:10] RECOVERY - Memcached on cloudcontrol1011 is OK: TCP OK - 0.000 second response time on 10.64.151.8 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [17:34:47] 10VPS-project-Codesearch, 10phan-taint-check-plugin: phan-taint-check-plugin / SecurityCheckPlugin is indexed by Codesearch twice - https://phabricator.wikimedia.org/T413879#11497211 (10sbassett) > In October 2023, https://gerrit.wikimedia.org/r/c/labs/codesearch/+/965841 (2aff2405b99a) added the mediawiki/too... [17:37:40] (03PS1) 10SBassett: Remove duplicated config for the SecurityCheckPlugin lib/tool [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1223699 (https://phabricator.wikimedia.org/T413879) [17:58:49] (03CR) 10Ladsgroup: Remove duplicated config for the SecurityCheckPlugin lib/tool (031 comment) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1223699 (https://phabricator.wikimedia.org/T413879) (owner: 10SBassett) [18:11:42] (03CR) 10A smart kitten: Remove duplicated config for the SecurityCheckPlugin lib/tool (031 comment) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1223699 (https://phabricator.wikimedia.org/T413879) (owner: 10SBassett) [19:17:21] 10Cloud-VPS (Debian Bullseye Deprecation), 10Beta-Cluster-Infrastructure, 07Epic, 06Release-Engineering-Team (Priority Backlog πŸ“₯): Migrate deployment-prep away from Debian Bullseye to Bookworm/Trixie - https://phabricator.wikimedia.org/T401839#11497588 (10bd808) [19:45:13] 10Tool-curator: Wait when lockmanager-fail-conflict error occurs - https://phabricator.wikimedia.org/T413915 (10DaxServer) 03NEW [19:45:43] 10Tool-curator: Wait when lockmanager-fail-conflict error occurs - https://phabricator.wikimedia.org/T413915#11497701 (10DaxServer) p:05Triageβ†’03Medium [20:15:19] 06cloud-services-team, 10GitLab (CI & Job Runners), 06Release-Engineering-Team (Priority Backlog πŸ“₯): Recent incidents of buildkitd's storage volume filling up - https://phabricator.wikimedia.org/T395097#11497749 (10dancy) 05In progressβ†’03Resolved @Andrew has since changed where he performing the buil...