[00:02:17] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10098455 (10bd808) [00:20:06] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10098466 (10bd808) Nominations are now closed. The list of nominated candidates is: * @JJMC89 (nominated by @bd808) * @Pintoch (nominated by @bd808) * @LucasWerkmeiste... [00:27:25] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10098470 (10bd808) {T154625} was the tracking task for the 2017 NDA process and may help us figure out next steps in the 2024 process. [00:35:17] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10098474 (10LucasWerkmeister) FWIW, work-me signed some NDA years ago (compare T208518). I try to keep my work and volunteer accounts pretty separate, but legally I’m... [00:35:47] 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10098477 (10DavidTornheim) > ` > panic: runtime error: index out of range [2] with length 0 > > goroutine 1 [running]: > main.extractGANom(...) > /home/curtispf/Yapperbot/FRS/matchers.go:106 > mai... [00:44:48] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10098479 (10JJMC89) What access needs to be covered in the NDA for TFSC? I'm in the process signing a volunteer NDA for other access (T369314) - let me know if I shoul... [00:49:11] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:54:11] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:50:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:00:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:09:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:26:57] (03CR) 10Lokal Profil: "> Patch Set 2:" (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065124 (https://phabricator.wikimedia.org/T319787) (owner: 10Jean-Frédéric) [05:27:55] (03CR) 10Lokal Profil: [C:03+2] Use toolforge-jobs to install requirements during deployment [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065124 (https://phabricator.wikimedia.org/T319787) (owner: 10Jean-Frédéric) [05:28:50] (03CR) 10Lokal Profil: [C:03+2] Remove `composer update` step from build-php script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065125 (owner: 10Jean-Frédéric) [05:29:42] (03Merged) 10jenkins-bot: Use toolforge-jobs to install requirements during deployment [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065124 (https://phabricator.wikimedia.org/T319787) (owner: 10Jean-Frédéric) [05:30:33] (03Merged) 10jenkins-bot: Remove `composer update` step from build-php script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1065125 (owner: 10Jean-Frédéric) [06:24:20] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:25:34] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx [06:29:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:30:15] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:30:19] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx [07:23:00] (03open) 10dcaro: Fix color schemes for non-dark mode [toolforge-repos/sample-complex-app-frontend] (add_database_check) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/5 [08:13:56] 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10098796 (10dcaro) This can be closed now, the current size is <100M for the biggest table: ` harbor=> select schemaname as table_schema, relname as table_name, pg_size_pretty(pg... [08:14:48] 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10098799 (10dcaro) 05In progress→03Resolved [08:14:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): toolforge: puppetserver got OOMkilled - https://phabricator.wikimedia.org/T369797#10098806 (10dcaro) 05Declined→03Resolved [08:15:24] 10Toolforge (Toolforge iteration 14): [builds-api] quota command failing on functional tests on tools - https://phabricator.wikimedia.org/T373293#10098801 (10dcaro) 05Duplicate→03Resolved [08:15:36] 10Toolforge (Toolforge iteration 14): [harbor] Investigate how to deactivate wal from trove for postrges databases - https://phabricator.wikimedia.org/T370845#10098803 (10dcaro) 05Declined→03Resolved [08:23:24] 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [maintain-harbor,docs] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176#10098810 (10dcaro) LGTM [08:24:15] (03update) 10dcaro: [jobs-api] refactor validate_kube_quant [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/118 (https://phabricator.wikimedia.org/T361120) (owner: 10raymond-ndibe) [08:25:27] 10Toolforge (Toolforge iteration 14), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [maintain-harbor,docs] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176#10098812 (10dcaro) 05In progress→03Resolved [08:26:35] (03update) 10dcaro: [jobs-api] convert all quotas to appropriate units [repos/cloud/toolforge/jobs-api] (refactor_validate_kube_quant) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361120) (owner: 10raymond-ndibe) [08:34:29] 06cloud-services-team, 10Cloud-VPS (Project-requests): Request creation of usdtest VPS project - https://phabricator.wikimedia.org/T373386#10098859 (10taavi) 05Open→03Resolved [08:36:43] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26 - https://phabricator.wikimedia.org/T327025#10098865 (10dcaro) [08:37:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26 - https://phabricator.wikimedia.org/T327025#10098869 (10dcaro) p:05Triage→03High [08:39:20] (03PS1) 10David Caro: docs: add wmcs.yaml to the generated config with default values [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1067911 [08:39:28] 10Toolforge (Toolforge iteration 14): Toolforge Aptfile not producing working copy of `ffmpeg` - https://phabricator.wikimedia.org/T365633#10098874 (10dcaro) 05In progress→03Resolved [08:40:58] (03open) 10sstefanova: utils: add components to get_versions [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/492 [08:44:27] (03approved) 10dcaro: utils: add components to get_versions [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/492 (owner: 10sstefanova) [08:45:57] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248#10098896 (10Slst2020) [08:47:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248#10098903 (10Slst2020) 05Open→03In progress [08:48:19] 06cloud-services-team, 10Toolforge: [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10098925 (10Slst2020) 05In progress→03Open [08:50:22] 06cloud-services-team, 10Toolforge: [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10098916 (10Slst2020) [08:50:24] 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163#10098928 (10dcaro) @Leloiandudu We had some issues with DNS resolution inside k8s the last few days ({T373243}), is the issue still happening? [09:09:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:14:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:20:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:30:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:05:56] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [10:21:00] (03approved) 10dcaro: [volume-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/15 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:21:04] (03update) 10dcaro: [volume-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/15 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:21:08] (03approved) 10dcaro: [registry-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/11 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:21:25] (03update) 10dcaro: [registry-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/11 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:12] (03approved) 10dcaro: [jobs-emailer] add topologySpreadConstraints to deployment [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/6 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:13] (03update) 10dcaro: [jobs-emailer] add topologySpreadConstraints to deployment [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/6 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:34] (03approved) 10dcaro: [jobs-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/117 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:35] (03update) 10dcaro: [jobs-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/117 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:48] (03approved) 10dcaro: [ingress-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/8 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:22:49] (03update) 10dcaro: [ingress-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/8 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:23:35] (03approved) 10dcaro: [envvars-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/envvars-api] (node-selector-to-test-topology-constraints) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/46 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:23:35] (03update) 10dcaro: [envvars-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/envvars-api] (node-selector-to-test-topology-constraints) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/46 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:25:39] (03approved) 10dcaro: [envvars-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/10 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:25:40] (03update) 10dcaro: [envvars-admission] add topologySpreadConstraints to deployment [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/10 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:08] (03approved) 10dcaro: [builds-builder] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/58 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:11] (03update) 10dcaro: [builds-builder] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/58 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:12] (03update) 10dcaro: [builds-builder] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/58 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:18] (03update) 10dcaro: [api-gateway] add topologySpreadConstraints to deployment [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/35 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:42] (03approved) 10dcaro: [api-gateway] add topologySpreadConstraints to deployment [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/35 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:27:43] (03update) 10dcaro: [api-gateway] add topologySpreadConstraints to deployment [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/35 (https://phabricator.wikimedia.org/T358203) (owner: 10raymond-ndibe) [10:33:26] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [10:39:56] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [10:50:13] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099254 (10MBH) @dcaro My tool reads data from DB replica. Less than hour earlier tool was working correctly, but now it returns this error (in... [10:53:00] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099273 (10dcaro) >>! In T373243#10099254, @MBH wrote: > @dcaro My tool reads data from DB replica. Less than hour earlier tool was working corr... [10:56:22] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099287 (10dcaro) All the workers seem to be responding ok (might be flaky, but no errors so far): ` root@cloudcumin1001:~# cumin --force 'O{pro... [10:56:24] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099288 (10MBH) It's a web tool. Request: https://mbh.toolforge.org/cgi-bin/page-authors?wiki=ru.wikipedia&type=cat&source=%D0%AF%D0%B7%D1%8B%D... [10:57:33] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10099294 (10Jgiannelos) I think after the latest... [11:04:56] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:09:30] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099329 (10dcaro) @MBH I'm suspecting this change: https://github.com/Saisengen/wikibots/commit/060db5fa675a14623426b88e851fa1a4f0f75e04#diff-e5... [11:10:52] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099331 (10dcaro) Ex. this works for me (putting type first): https://mbh.toolforge.org/cgi-bin/page-authors?type=cat&wiki=ru.wikipedia&source=%... [11:20:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:28:36] 10Toolforge (Toolforge iteration 14): DNS on toolforge kubernetes seems to fail regularly (20-25% of the time at least) - https://phabricator.wikimedia.org/T373243#10099353 (10MBH) Thanks. I already used string indexation in other tools, but not this tool, because it's very old code. [11:30:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:47:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:52:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:43:33] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade from 1.25.16 to 1.26.15 [12:43:34] !log sstefanova@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=99) for cluster toolsbeta upgrade from 1.25.16 to 1.26.15 [12:45:26] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster toolsbeta upgrade from 1.25.16 to 1.26.15 [12:45:57] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster toolsbeta upgrade from 1.25.16 to 1.26.15 [12:49:21] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-7 from 1.25.16 to 1.26.15 [13:17:24] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-7 from 1.25.16 to 1.26.15 [13:18:04] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-8 from 1.25.16 to 1.26.15 [13:18:05] !log sstefanova@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node toolsbeta-test-k8s-control-8 from 1.25.16 to 1.26.15 [13:19:31] 06cloud-services-team, 10Web Team Visual Regression Framework, 10Quality-and-Test-Engineering-Team (Test Infrastructure): Move disk space and other Pixel metrics from Graphite to Prometheus - https://phabricator.wikimedia.org/T363969#10099624 (10Peter) Hi @taavi are you the right person who knows how Prometh... [13:26:00] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-8 from 1.25.16 to 1.26.15 [13:32:25] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-8 from 1.25.16 to 1.26.15 [13:32:41] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-control-9 from 1.25.16 to 1.26.15 [13:49:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:51:11] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-control-9 from 1.25.16 to 1.26.15 [13:52:29] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 [13:53:29] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 [13:55:20] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 [13:56:21] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 [13:59:46] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 [14:00:47] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 [14:01:32] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-10 from 1.25.16 to 1.26.15 [14:02:34] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-10 from 1.25.16 to 1.26.15 [14:03:03] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-11 from 1.25.16 to 1.26.15 [14:04:07] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-11 from 1.25.16 to 1.26.15 [14:04:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:05:35] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-6 from 1.25.16 to 1.26.15 [14:06:32] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-6 from 1.25.16 to 1.26.15 [14:07:26] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-7 from 1.25.16 to 1.26.15 [14:08:23] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-7 from 1.25.16 to 1.26.15 [14:09:15] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-worker-nfs-4 from 1.25.16 to 1.26.15 [14:10:21] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-worker-nfs-4 from 1.25.16 to 1.26.15 [14:10:43] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node toolsbeta-test-k8s-ingress-8 from 1.25.16 to 1.26.15 [14:11:22] 06cloud-services-team, 10Web Team Visual Regression Framework, 10Quality-and-Test-Engineering-Team (Test Infrastructure): Move disk space and other Pixel metrics from Graphite to Prometheus - https://phabricator.wikimedia.org/T363969#10099792 (10taavi) Not anymore :-) and please see https://wikitech.wikimedi... [14:11:41] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node toolsbeta-test-k8s-ingress-8 from 1.25.16 to 1.26.15 [14:13:02] 10Horizon, 13Patch-For-Review: Use IDP for authentication in Horizon - https://phabricator.wikimedia.org/T359590#10099795 (10Andrew) a:03Andrew [14:19:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#10099824 (10Slst2020) 05In progress→03Resolved [14:21:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 14): [infra,k8s] Upgrade Toolsbeta to k8s 1.26 - https://phabricator.wikimedia.org/T370248#10099826 (10Slst2020) 05In progress→03Resolved [14:25:47] vivian-rook opened https://github.com/toolforge/quarry/pull/68 [14:44:33] 10Quarry: Remove quarry-124 cluster - https://phabricator.wikimedia.org/T373375#10099868 (10rook) 05Open→03Resolved a:03rook [14:44:39] vivian-rook closed https://github.com/toolforge/quarry/pull/68 [14:45:43] 10Quarry: unused dns proxies? - https://phabricator.wikimedia.org/T373528 (10rook) 03NEW [14:47:15] 10Quarry: unused dns proxies? - https://phabricator.wikimedia.org/T373528#10099881 (10rook) [14:47:18] 10Quarry: unused dns proxies? - https://phabricator.wikimedia.org/T373528#10099882 (10rook) [14:47:28] 10Quarry: unused dns proxies? - https://phabricator.wikimedia.org/T373528#10099883 (10rook) [15:10:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:15:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:19:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:22:15] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10100043 (10Eevans) 05Open→03Resolved a:... [15:27:24] 10Toolforge: [k8s, cookbooks] Transient error during Toolsbeta k8s 1.25 -> 1.26 upgrade - https://phabricator.wikimedia.org/T373533 (10Slst2020) 03NEW [15:28:03] 10Toolforge: [k8s, cookbooks] Transient error during Toolsbeta k8s 1.25 -> 1.26 upgrade - https://phabricator.wikimedia.org/T373533#10100096 (10Slst2020) [15:31:15] 10Toolforge: [k8s, cookbooks] Transient error during Toolsbeta k8s 1.25 -> 1.26 upgrade - https://phabricator.wikimedia.org/T373533#10100116 (10Slst2020) [15:34:45] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: toolforge: prometheus server died - https://phabricator.wikimedia.org/T370143#10100148 (10dcaro) It died again, this time I was monitoring the logs, and it looks like there was some network issue and started... [15:35:42] 10Toolforge: [k8s, cookbooks] Transient error during Toolsbeta k8s 1.25 -> 1.26 upgrade - https://phabricator.wikimedia.org/T373533#10100154 (10Slst2020) [15:49:18] 06Toolforge-standards-committee, 07User-notice: Refresh membership of Toolforge standards committee - https://phabricator.wikimedia.org/T370474#10100225 (10bd808) >>! In T370474#10098479, @JJMC89 wrote: > What access needs to be covered in the NDA for TFSC? I'm in the process signing a volunteer NDA for other... [15:49:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:51:10] 10Toolforge: [k8s, kube-proxy] "udpIdleTimeout" KubeProxyConfiguration deprecation - https://phabricator.wikimedia.org/T373537 (10Slst2020) 03NEW [16:04:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:05:58] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:13:21] 10Toolforge: [k8s, kube-proxy] "udpIdleTimeout" KubeProxyConfiguration deprecation - https://phabricator.wikimedia.org/T373537#10100299 (10Slst2020) I'm not seeing the deprecated field in the kube-proxy config: ` root@toolsbeta-test-k8s-control-8:~# kubectl get configmap kube-proxy -n kube-system -o yaml apiVer... [16:33:01] 10Data-Services, 06Abstract Wikipedia team, 10Wikifunctions: WikiLambda tables are not replicated to cloud - https://phabricator.wikimedia.org/T372058#10100373 (10Jdforrester-WMF) p:05Triage→03Low [17:01:58] 10PAWS: jupyterlab to 4.2.5 - https://phabricator.wikimedia.org/T373544 (10rook) 03NEW [17:08:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:10:15] FIRING: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown [17:13:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:15:15] RESOLVED: ToolforgeKubernetesHAproxyUnknown: Toolforge HAproxy has unknown state. HAproxy might be down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyUnknown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyUnknown [17:17:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:32:03] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:32:07] 06cloud-services-team: PuppetFailure Puppet failure on cloudcontrol2004-dev:9100 - https://phabricator.wikimedia.org/T373547 (10phaultfinder) 03NEW [17:59:05] 10Toolforge, 07Documentation, 07good first task: Find and fix inaccuracies in Toolforge Django tutorial - https://phabricator.wikimedia.org/T245683#10100749 (10Chickenleaf) Hello! @srodlund can i be assigned this? Im new to more organized large scale open-source projects and would like to begin my contributi... [18:37:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:39:40] 10Toolforge, 07Documentation, 07good first task: Find and fix inaccuracies in Toolforge Django tutorial - https://phabricator.wikimedia.org/T245683#10100928 (10Dzahn) a:03Chickenleaf Hey, it looks like @srodlund's account is inactive. I'm not related to this much but just being bold and assigned it to you... [18:46:03] 10Toolforge, 07Documentation, 07good first task: Find and fix inaccuracies in Toolforge Django tutorial - https://phabricator.wikimedia.org/T245683#10100952 (10Chickenleaf) >>! In T245683#10100928, @Dzahn wrote: > Hey, it looks like @srodlund's account is inactive. I'm not related to this much but just bein... [18:47:03] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:50:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:52:37] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:57:37] RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service keystone-admin-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:58:14] 10Toolforge, 07Documentation, 07good first task: Find and fix inaccuracies in Toolforge Django tutorial - https://phabricator.wikimedia.org/T245683#10100971 (10Dzahn) I saw your comments on other tickets. I would say in general you can apply [[ https://en.wikipedia.org/wiki/Wikipedia:Be_bold | Be bold ]] her... [19:05:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:18:17] 14Grid-Engine-to-K8s-Migration, 10Wiki-Loves-Monuments-Database, 13Patch-For-Review: Migrate heritage from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319787#10101032 (10JeanFred) >>! In T319787#10091157, @dcaro wrote: > * You can call directly the API to start a new job... [19:43:37] 14Grid-Engine-to-K8s-Migration, 10Wiki-Loves-Monuments-Database, 13Patch-For-Review: Migrate heritage from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319787#10101065 (10JeanFred) >>! In T319787#10101032, @JeanFred wrote: >>>! In T319787#10091157, @dcaro wrote: >> * You... [20:20:56] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:25:24] vivian-rook opened https://github.com/toolforge/paws/pull/452 [20:30:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:48:01] 10PAWS: PR usually not posting to phabricator - https://phabricator.wikimedia.org/T373134#10101227 (10rook) [20:48:03] 10PAWS: PR usually not posting to phabricator - https://phabricator.wikimedia.org/T373134#10101228 (10rook) In another log: