[00:28:11] (03update) 10don-vip: Draft: Add NIH BioArt [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/10 [01:10:47] FIRING: NodeDown: Node cloudbackup1001-dev has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [01:42:15] RECOVERY - SSH on cloudbackup1001-dev is OK: SSH OK - OpenSSH_10.0p2 Debian-7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [01:45:47] RESOLVED: NodeDown: Node cloudbackup1001-dev has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [01:45:47] RESOLVED: NodeDown: Node cloudbackup1001-dev is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [02:05:56] FIRING: SystemdUnitDown: The service unit remove_dangling_cinder_snapshots.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:10:56] FIRING: [2x] SystemdUnitDown: The service unit remove_dangling_cinder_snapshots.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:20:49] 06cloud-services-team, 10Toolforge: Jobs failing with no logs - https://phabricator.wikimedia.org/T403927#11330261 (10Hawkeye7) Things seem to be working a lot better now. Thanks for your help in getting my C# bots to run. `toolforge jobs logs` seems to be working satisfactorily too. [04:00:56] FIRING: SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:05:56] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:06:59] 10Tool-paulina: Project Proposal: 3-Month Outreachy Project Plan — Add Wikidata Editing Features to Paulina — by Afanyu Lionel - https://phabricator.wikimedia.org/T408889 (10Afanyulionel) 03NEW [04:16:18] 10Tool-wsindex, 10Wikisource Reader App: Add deletion logic to WSIndex API - https://phabricator.wikimedia.org/T408588#11330324 (10Saiphani02) Yep [06:47:01] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [06:47:02] 10Tool-wsindex, 10Wikisource Reader App: Add deletion logic to WSIndex API - https://phabricator.wikimedia.org/T408588#11330454 (10System625) a:03System625 [06:48:53] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 0.208 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [07:55:30] 06cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Persistence, 06Data-Platform-SRE: Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11330590 (10Marostegui) >>! In T401295#11323391, @fnegri wrote: >> And just a quick look right now on the... [07:59:56] 10Tool-wsindex, 10Wikisource Reader App: Add deletion logic to WSIndex API - https://phabricator.wikimedia.org/T408588#11330611 (10System625) Hello @Saiphani02 , I have made a PR in the Wsindex repo: https://codeberg.org/ph4ni/wsindex/pulls/7 Please kindly review [08:05:56] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:22:56] 06cloud-services-team (FY2025/26-Q1), 10Data-Services, 06Data-Persistence, 06Data-Platform-SRE (2025.10.17 - 2025.11.07): Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11330644 (10Gehel) [08:27:26] 06cloud-services-team, 10Cloud-VPS: MTU setting in IPv6 VMs causes issues with Docker - https://phabricator.wikimedia.org/T408543#11330670 (10cmooney) >>! In T408543#11328875, @cmooney wrote: > I would rate the options in terms of fixing it in this order: > > # Move cloudvirt/cloudnet to support jumbo frames... [09:21:28] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [09:21:39] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [09:26:28] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [09:31:28] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [09:36:28] RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [09:36:39] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:11:28] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:11:39] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:16:28] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:16:39] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:56:34] 06cloud-services-team, 10Data-Services: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11331057 (10taavi) For the rsync problem, I see roughly options here: * Provision a separate host name like `rsync.dumps.wikimedia.org`, and ask the rsync mirror operators t... [12:06:11] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:26:16] (03CR) 10Majavah: [C:04-1] "Yeah, this will cause the validation to run for values that are just plain wrong (the username prefixed with just `tools` and nothing else" [labs/striker] - 10https://gerrit.wikimedia.org/r/1200014 (https://phabricator.wikimedia.org/T408787) (owner: 10DamianZaremba) [12:47:07] (03Abandoned) 10DamianZaremba: check_toolname_create - use prefixed name [labs/striker] - 10https://gerrit.wikimedia.org/r/1200014 (https://phabricator.wikimedia.org/T408787) (owner: 10DamianZaremba) [14:52:57] 06cloud-services-team, 10Cloud-VPS (Project-requests): CloudVPS instance for ProVe - https://phabricator.wikimedia.org/T408387#11331455 (10Albert.meronyo) Sure, thanks @Andrew This comes from a few discussions in Tools/Potential Gadgets specifically [[ https://www.wikidata.org/wiki/Wikidata:Tools/Potential_gad... [16:05:56] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:15:56] FIRING: [2x] SystemdUnitDown: The service unit postgresql@17-main.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:15:56] RESOLVED: SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:18:05] 10Tool-wsindex, 10Developer-Outreach, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11331679 (10Bodhisattwa) [16:19:51] 10Tool-wsindex, 10Wikisource Reader App: Add deletion logic to WSIndex API - https://phabricator.wikimedia.org/T408588#11331670 (10Saiphani02) 05Open→03Resolved Merged [16:20:56] RESOLVED: [2x] SystemdUnitDown: The service unit postgresql@17-main.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:29:22] (03update) 10adwivedii: Proof of Concept: Add work form with autocomplete [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/155 [17:43:07] (03update) 10adwivedii: Proof of Concept: Add work form with autocomplete [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/155 [17:49:44] 10Tool-paulina: Proof of concept: add work form, with autocomplete - https://phabricator.wikimedia.org/T407558#11331913 (10Adwivedii) > > @Adwivedii, don't worry, my comment about dropdown keyboard navigation is a very minor detail. It's not important; the feature works super well. > Okayy!! But I've alread... [17:59:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.953% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [18:07:09] 10Tool-wsindex, 10Developer-Outreach, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11331934 (10Saiphani02) [18:07:43] 10Tool-wsindex, 10Developer-Outreach, 10Wikisource Reader App, 10Outreachy (Round 31): Outreachy 31: Improve the Wikisource Reader App - https://phabricator.wikimedia.org/T405593#11331940 (10Saiphani02) [19:10:56] FIRING: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:19:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.987% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:30:56] RESOLVED: [3x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:38:26] FIRING: [4x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:43:26] RESOLVED: [2x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:49:41] FIRING: [3x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:05:39] FIRING: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:10:39] RESOLVED: [3x] ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:13:26] RESOLVED: SystemdUnitDown: The service unit postgresql@17-main.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:19:56] FIRING: SystemdUnitDown: The service unit postgresql@17-main.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:41:11] 10PAWS, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T408157#11332419 (10LibUp-bot) A new upstream version of Pywikibot is now available: 10.7.0. * https://gerrit.wikimedia.org/g/pywikibot/core/+/refs/tags/10.7.0 * https://doc.wikimedia.org/pywikibot/stable/chan... [20:41:16] 10Toolforge, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T408158#11332420 (10LibUp-bot) A new upstream version of Pywikibot is now available: 10.7.0. * https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Pywikibot_image * https://gerrit.wikimedia.org/g/p... [20:53:48] FIRING: PuppetFailure: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [21:52:08] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [21:53:58] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30032 bytes in 0.440 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [21:59:56] FIRING: [2x] SystemdUnitDown: The service unit postgresql@17-main.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:03:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudbackup1001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [22:04:08] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [22:05:00] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 2.602 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [22:14:56] FIRING: SystemdUnitDown: The systemd unit postgresql@17-main.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:19:56] FIRING: [2x] SystemdUnitDown: The systemd unit postgresql@17-main.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown