[00:36:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:41:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:46:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:12:34] 06cloud-services-team, 10Toolforge: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908#10534378 (10aborrero) Still not solved {F58382231} [01:20:23] 10tool-wdlocator: Wikidata item needs photo although one is already assigned - https://phabricator.wikimedia.org/T385957#10534379 (10Samwilson) 05Open→03Resolved a:03Samwilson > Yes, i did try the refresh button, but i was not sure what it was supposed to do. Because after clicking the refresh button... [05:19:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [06:04:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [09:55:37] (03update) 10arthurtaylor: Fetch job data on demand [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/4 (https://phabricator.wikimedia.org/T384925) (owner: 10lucaswerkmeister-wmde) [09:55:37] (03approved) 10arthurtaylor: Fetch job data on demand [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/4 (https://phabricator.wikimedia.org/T384925) (owner: 10lucaswerkmeister-wmde) [09:55:50] (03merge) 10arthurtaylor: Fetch job data on demand [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/4 (https://phabricator.wikimedia.org/T384925) (owner: 10lucaswerkmeister-wmde) [10:10:37] 06cloud-services-team, 10Toolforge: [toolsdb] tools-db-4 switched to read-only - https://phabricator.wikimedia.org/T385900#10534956 (10fnegri) > Here is another SHOW PROCESSLIST right after a crash and restart: Not much showing there, but it's expected because after a restart tools will take some time to reco... [10:16:53] (03open) 10lucaswerkmeister-wmde: Tell Gunicorn to trust X-Forwarded-* headers [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/5 [10:20:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly on primary host - https://phabricator.wikimedia.org/T385900#10534965 (10fnegri) 05Open→03In progress p:05Triage→03High a:03fnegri [10:25:30] (03approved) 10arthurtaylor: Tell Gunicorn to trust X-Forwarded-* headers [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/5 (owner: 10lucaswerkmeister-wmde) [10:25:33] (03merge) 10arthurtaylor: Tell Gunicorn to trust X-Forwarded-* headers [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/5 (owner: 10lucaswerkmeister-wmde) [10:34:38] Change on 12wikitech.wikimedia.org a page Help:Toolforge/Building container images was modified, changed by Lucas Werkmeister (WMDE) link https://wikitech.wikimedia.org/w/index.php?diff=2268506 edit summary: /* Example: Python web service */ tell Gunicorn to trust X-Forwarded-* headers trom Toolforge proxy (enables correct protocol in Flask-generated external links and probably other WSGI apps); compare w.wiki/D29T and w.wiki/D29V [10:36:41] Change on 12wikitech.wikimedia.org a page Help:Toolforge/Building container images/My first Buildpack Python tool was modified, changed by Lucas Werkmeister (WMDE) link https://wikitech.wikimedia.org/w/index.php?diff=2268507 edit summary: /* How to create a basic Flask WSGI webservice */ tell Gunicorn to trust X-Forwarded-* headers trom Toolforge proxy, see [[Special:Diff/2268506]] [12:20:00] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/30 [12:20:12] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/4 [12:21:31] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [12:29:38] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [12:30:33] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [12:31:37] 06cloud-services-team, 10Cloud-VPS: CloudVPSDesignateLeaks alert is flapping - https://phabricator.wikimedia.org/T384118#10535253 (10Andrew) 05Open→03Resolved [12:33:39] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 - https://phabricator.wikimedia.org/T384327#10535257 (10Raymond_Ndibe) here is the patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/1113871. It has been merged and deployed (For some reason I... [12:35:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [gitlab-ci] twine 6.1.0 breaks pypi deploy - https://phabricator.wikimedia.org/T385853#10535260 (10Raymond_Ndibe) 05In progress→03Resolved a:03Raymond_Ndibe [12:35:49] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 - https://phabricator.wikimedia.org/T384327#10535267 (10Raymond_Ndibe) 05In progress→03Resolved [12:36:06] 10Toolforge (Toolforge iteration 17): [infra, harbor] use latest thirdparty/docker in harbor hosts - https://phabricator.wikimedia.org/T384720#10535270 (10Raymond_Ndibe) 05In progress→03Resolved [12:36:11] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [12:40:16] 06cloud-services-team, 10Cloud-VPS: Do not create DNS zones for projects outside default domain - https://phabricator.wikimedia.org/T380095#10535275 (10Andrew) 05Open→03Resolved looks good [12:41:25] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [12:41:59] 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10535282 (10Andrew) 05Open→03Resolved I think this is as fixed as it can be. [12:48:16] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [12:48:52] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10535305 (10Andrew) a:03Andrew [14:32:13] 06cloud-services-team, 10Openstack-Magnum: CSI Cinder issues causing periodic failures on Magnum cluster - https://phabricator.wikimedia.org/T383560#10535578 (10rook) There were problems like this on older fcos versions of k8s deployed by magnum (T336586). Your current cluster appears to be using Fedora-CoreOS... [15:23:18] FIRING: [2x] KernelErrors: Server cloudnet1006 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudnet1006 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [16:46:32] 10PAWS: Missing notebooks for an account - https://phabricator.wikimedia.org/T385048#10535999 (10rook) It's not immediately clear to me what happened to the files. I'm tempted to blame a file purge from abusive accounts that we've been having trouble with. T379746. The abusive accounts pull down a lot of junk an... [16:58:32] 10PAWS: Missing notebooks for an account - https://phabricator.wikimedia.org/T385048#10536054 (10Isaac) thanks @rook for looking into this. bummer to hear but it's useful to know that it's not recoverable (and was likely collateral damage and not something we accidentally did ourselves). We were doing a fair bit... [16:59:42] 10PAWS: Missing notebooks for an account - https://phabricator.wikimedia.org/T385048#10536059 (10rook) 05Open→03Resolved [17:01:22] 10PAWS: Missing notebooks for an account - https://phabricator.wikimedia.org/T385048#10536064 (10rook) Yeah sorry that the news isn't better. I would probably suggest that important notebooks should be kept in a git repo that is better at data retention than only in PAWS itself. [17:33:19] 06cloud-services-team, 10Catalyst (Kiwen): Grafana.wmcloud.org has project alerts for catalyst, route alerts catalyst/patchdemo maintainers - https://phabricator.wikimedia.org/T385330#10536187 (10thcipriani) Thanks for the updated documentation. Sorry for the confusion on our side. The docs now read: > Cloud... [17:43:37] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10536227 (10Reedy) >>! In T376267#10533629, @DWYoungDLS wrote: > |**Wikitech account/LDAP:**| Hotaru Natsumi | > |**SUL account**| Hotaru Natsumi | > |**Account linked on [[ https://idm.wikim... [18:19:17] Change on 12wikitech.wikimedia.org a page Help:Toolforge/Database was modified, changed by Lucas Werkmeister link https://wikitech.wikimedia.org/w/index.php?diff=2270173 edit summary: tell people what to do if replica.my.cnf doesn’t exist (two people today thought they could create it manually, maybe this will help…) [18:32:42] (03PS1) 10Majavah: settings: Use Wikitech for SUL OAuth login [labs/striker] - 10https://gerrit.wikimedia.org/r/1118564 [18:34:58] (03CR) 10CI reject: [V:04-1] settings: Use Wikitech for SUL OAuth login [labs/striker] - 10https://gerrit.wikimedia.org/r/1118564 (owner: 10Majavah) [18:36:05] (03PS2) 10Majavah: settings: Use Wikitech for SUL OAuth login [labs/striker] - 10https://gerrit.wikimedia.org/r/1118564 [18:37:24] (03CR) 10CI reject: [V:04-1] settings: Use Wikitech for SUL OAuth login [labs/striker] - 10https://gerrit.wikimedia.org/r/1118564 (owner: 10Majavah) [18:38:28] (03PS3) 10Majavah: settings: Use Wikitech for SUL OAuth login [labs/striker] - 10https://gerrit.wikimedia.org/r/1118564 [18:38:29] (03PS1) 10Majavah: striker: Update for new Black [labs/striker] - 10https://gerrit.wikimedia.org/r/1118565 [18:50:33] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Unable to persistently set fs.inotify.max_user_instances and fs.inotify.max_user_watches - https://phabricator.wikimedia.org/T385530#10536484 (10taavi) Are the Puppet failures emails I've been getting from `big-bessie-steakums.catalyst-qte.eqiad1.wikime... [18:58:33] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Unable to persistently set fs.inotify.max_user_instances and fs.inotify.max_user_watches - https://phabricator.wikimedia.org/T385530#10536491 (10Andrew) >>! In T385530#10536484, @taavi wrote: > Are the Puppet failures emails I've been getting from `big-... [19:02:29] (03update) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/4 (owner: 10l10n-bot) [19:03:23] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/4 (owner: 10l10n-bot) [19:03:27] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/4 (owner: 10l10n-bot) [19:15:54] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10536531 (10Andrew) One possible explanation for us getting into this state: If a user is added to a project as a /member/ but doesn't hav... [19:17:48] !log andrew@cloudcumin1001 testlabs START - Cookbook wmcs.vps.add_user_to_project for user 'mortalandrew' in role 'member' [19:17:54] !log andrew@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'mortalandrew' in role 'member' [19:18:37] !log andrew@cloudcumin1001 testlabs START - Cookbook wmcs.vps.add_user_to_project for user 'mortalandrew' in role 'member' [19:18:43] !log andrew@cloudcumin1001 testlabs END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'mortalandrew' in role 'member' [19:19:33] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10536549 (10Andrew) >>! In T379550#10536531, @Andrew wrote: > That could be fixed in wmfkeystonehooks, but imo being member w/out reader i... [19:21:09] (03PS1) 10Andrew Bogott: add_user_to_project: small changes to usage [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1118573 [20:20:47] (03update) 10thcipriani: Feat: Support duration by multiple "categories" [toolforge-repos/jenkins-build-stats] - 10https://gitlab.wikimedia.org/toolforge-repos/jenkins-build-stats/-/merge_requests/3 [20:20:53] (03update) 10thcipriani: Feat: Support duration by multiple "categories" [toolforge-repos/jenkins-build-stats] - 10https://gitlab.wikimedia.org/toolforge-repos/jenkins-build-stats/-/merge_requests/3 [20:26:41] (03update) 10thcipriani: Feat: Support duration by multiple "categories" [toolforge-repos/jenkins-build-stats] - 10https://gitlab.wikimedia.org/toolforge-repos/jenkins-build-stats/-/merge_requests/3 [20:31:35] (03update) 10thcipriani: Feat: Support duration by multiple "categories" [toolforge-repos/jenkins-build-stats] - 10https://gitlab.wikimedia.org/toolforge-repos/jenkins-build-stats/-/merge_requests/3 [22:00:33] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026 (10bd808) 03NEW [22:22:51] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537053 (10Apap04) Please detach 'Apap04' from SUL, rename it to '-andreas', and reattach to SUL. :) [22:24:17] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537059 (10bd808) p:05Triage→03High a:03bd808 [22:24:39] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537068 (10NBaca-WMF) Thanks for flagging this @bd808 ! My accounts were caught up in this, due to my name change. Please detach 'Nathil... [22:30:25] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537096 (10bd808) [22:42:26] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): If necessary, bump down quota for wikitextexp now that we've migrated from parsing-qa-02 -> ctt-qa-03 - https://phabricator.wikimedia.org/T386030 (10ssastry) 03NEW [22:56:01] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537161 (10mszabo) Please detach 'Máté Szabó' from SUL, rename it to 'MSzabo-WMF', and reattach to SUL. [22:56:52] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537163 (10Ladsgroup) >>! In T386026#10537068, @NBaca-WMF wrote: > Thanks for flagging this @bd808 ! My accounts were caught up in this, d... [23:00:39] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537164 (10Ladsgroup) >>! In T386026#10537161, @mszabo wrote: > Please detach 'Máté Szabó' from SUL, rename it to 'MSzabo-WMF', and reatta... [23:02:40] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537172 (10bd808) >>! In T386026#10537053, @Apap04 wrote: > Please detach 'Apap04' from SUL, rename it to '-andreas', and reattach to SUL.... [23:06:20] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537177 (10Apap04) Thanks! [23:11:39] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537183 (10Poslovitch) Heyo, Please detach 'Florian Cuny' from SUL, rename it to 'Poslovitch', and reattach to SUL. Thanks for handling... [23:17:01] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537194 (10bd808) >>! In T386026#10537183, @Poslovitch wrote: > Heyo, > > Please detach 'Florian Cuny' from SUL, rename it to 'Poslovitch... [23:17:56] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537199 (10Poslovitch) Thanks! [23:19:24] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10537201 (10bd808) [23:39:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:42:07] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic, 13Patch-For-Review: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10537238 (10bd808) I have a set of accounts that errored out in renaming attempts on 2025-02-10 because the OAuth claimed SUL ac... [23:44:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:46:02] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): If necessary, bump down quota for wikitextexp now that we've migrated from parsing-qa-02 -> ctt-qa-03 - https://phabricator.wikimedia.org/T386030#10537248 (10bd808)