[05:50:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:24:06] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE: Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10421305 (10Marostegui) Option #1 and option #2 are probably not something we in #data-persistence would be comfortable... [07:00:45] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10421314 (10Daask) I have an existing SUL account. Within the last month, I created an account on Wikitech, which seemed successful. I was logged in to Wikitech when I first created the accou... [07:28:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-16 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:33:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-16 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:04:52] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10421471 (10Reedy) http://ldap.toolforge.org/user/daask doesn't seem to exist... http://ldap.toolforge.org/user/daask2 does Anyway: ` reedy@deploy2002:~$ mwscript extensions/CentralAuth/mai... [12:00:17] (03PS1) 10Urbanecm: app: Simplify retriving keys [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1106299 [12:40:01] (03CR) 10Urbanecm: [C:03+2] app: Simplify retriving keys [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1106299 (owner: 10Urbanecm) [12:40:20] (03Merged) 10jenkins-bot: app: Simplify retriving keys [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1106299 (owner: 10Urbanecm) [14:22:54] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE: Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10421762 (10fnegri) [14:35:05] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE: Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10421772 (10fnegri) @Marostegui I removed mentions of #data-persistence from options #1 and #2, and added a constraint t... [15:00:28] FIRING: InstanceDown: Project cloudinfra instance dns-resolver-internal-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:00:29] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on cloudinfra-internal-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [15:05:28] RESOLVED: InstanceDown: Project cloudinfra instance dns-resolver-internal-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:03:55] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10421914 [17:25:29] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on cloudinfra-internal-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [19:04:26] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10422030 (10Dwisehaupt) @MoritzMuehlenhoff At this point, the puppet5 puppet master is out of service or at least isn't doing any new updates. It will hopefu... [19:24:23] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10422061 (10Dwisehaupt) The new puppet 7 puppetserver is up and running. I have built and am testing crm-dev-02 with the new puppetserver and things appear t... [19:48:46] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10422093 (10Dwisehaupt) Tested the full restore process on crm-dev-02. Adjusted the web proxy from community-crm to crm-dev-01 and tested basic functionality... [20:05:52] 10VPS-Projects, 10Wikidocumentaries: Why have all images disappeared from https://wikidocumentaries.wmcloud.org/? - https://phabricator.wikimedia.org/T382138#10422100 (10TheDJ) 05Open→03Resolved a:03TheDJ [20:51:19] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10422170 (10Daask) Everything seems to be working fine for me now. Thank you, Reedy! I'm not sure what to say about http://ldap.toolforge.org/user/daask not existing. During account creation,... [20:59:35] 10VPS-project-Wikistats: Wikidata and Commons statistics on "All Wikimedia Projects by Size" no longer updated after 2024-05-22 - https://phabricator.wikimedia.org/T371164#10422181 (10Meno25) →14Duplicate dup:03T381623 [20:59:36] 10VPS-project-Wikistats: since all updates run with the "extinfo" parameter some tables did not get updated numbers - https://phabricator.wikimedia.org/T381623#10422183 (10Meno25) [21:05:01] 06cloud-services-team, 10Toolforge: toolforge webservice logs is broken - https://phabricator.wikimedia.org/T382685#10422187 (10Wargo) Run `kubectl logs ` and see what error (one of these? https://gitlab.wikimedia.org/search?search=unable&nav_source=navbar&project_id=608&group_id=688&search_code=true&rep... [21:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks