[00:00:38] FIRING: DatasourceError: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [00:05:38] RESOLVED: DatasourceError: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [00:13:04] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135 (10Novem_Linguae) 03NEW [00:13:27] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9877956 (10Novem_Linguae) [00:20:05] Project beta-update-databases-eqiad build #76636: 04STILL FAILING in 4.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76636/ [01:20:06] Project beta-update-databases-eqiad build #76637: 04STILL FAILING in 5.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76637/ [02:20:06] Project beta-update-databases-eqiad build #76638: 04STILL FAILING in 5.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76638/ [03:20:05] Project beta-update-databases-eqiad build #76639: 04STILL FAILING in 4.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76639/ [04:20:06] Project beta-update-databases-eqiad build #76640: 04STILL FAILING in 5.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76640/ [05:20:06] Project beta-update-databases-eqiad build #76641: 04STILL FAILING in 5.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76641/ [06:06:20] Project beta-scap-sync-world build #158805: 04FAILURE in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158805/ [06:16:35] Yippee, build fixed! [06:16:35] Project beta-scap-sync-world build #158806: 09FIXED in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158806/ [06:20:06] Project beta-update-databases-eqiad build #76642: 04STILL FAILING in 5.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76642/ [07:20:05] Project beta-update-databases-eqiad build #76643: 04STILL FAILING in 4.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76643/ [07:48:39] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144 (10hashar) 03NEW [07:48:54] I have filed T367144 for the beta cluster job failing. It is due to some patch in AbuseFilter [07:48:56] T367144: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144 [07:56:25] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144#9878374 (10matej_suchanek) [08:02:18] hashar: what happens if migrateActorsAF.php is ran directly [08:02:30] I don't see the actual error [08:02:36] no idea, I have only filed the task :D [08:03:42] oh wait i do [08:03:58] hashar: it says run maintenance/cleanupUsersWithNoId.php [08:04:09] wonder what that does [08:06:44] it doesn't have a dry run [08:20:06] Project beta-update-databases-eqiad build #76644: 04STILL FAILING in 5.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76644/ [08:32:31] (03approved) 10jnuche: README.md: Say how to create a new release [repos/releng/reggie] - 10https://gitlab.wikimedia.org/repos/releng/reggie/-/merge_requests/86 (owner: 10dancy) [08:33:15] 10Fresh: Fresh 24.05.1 installer does not remove fresh-node20 - https://phabricator.wikimedia.org/T367063#9878407 (10Lucas_Werkmeister_WMDE) Now that there’s a decision on which solution to go for, sure :) [08:33:17] (03PS1) 10Lucas Werkmeister (WMDE): Also install fresh-node20 [fresh] - 10https://gerrit.wikimedia.org/r/1041527 (https://phabricator.wikimedia.org/T367063) [08:33:41] 10Fresh, 13Patch-For-Review: Fresh 24.05.1 installer does not remove fresh-node20 - https://phabricator.wikimedia.org/T367063#9878410 (10Lucas_Werkmeister_WMDE) a:03Lucas_Werkmeister_WMDE [08:39:00] (03Abandoned) 10Thiemo Kreuz (WMDE): Review access change [sandbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/986222 (owner: 10Ankur gerrit) [08:39:16] (03Abandoned) 10Thiemo Kreuz (WMDE): Review access change [sandbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/986223 (owner: 10Ankur gerrit) [08:55:02] (03update) 10jnuche: README.md: Add more deployment details [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/65 (owner: 10thcipriani) [08:55:31] (03update) 10jnuche: README.md: Add more deployment details [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/65 (owner: 10thcipriani) [08:57:09] (03merge) 10jnuche: README.md: Add more deployment details [repos/releng/jenkins-deploy] - 10https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/65 (owner: 10thcipriani) [09:20:05] Project beta-update-databases-eqiad build #76645: 04STILL FAILING in 4.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76645/ [10:17:10] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9878787 (10Lucas_Werkmeister_WMDE) This seems to come from a ``, which renders the `icon` attribute (CSS has `content: attr(icon);` on the `::before`) using a... [10:20:05] Project beta-update-databases-eqiad build #76646: 04STILL FAILING in 4.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76646/ [10:20:18] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9878802 (10Lucas_Werkmeister_WMDE) Yeah, this doesn’t sound great (2022): `lang=css /** * This file has been produced by downloading this file on Sep 6, 2022: * https://fonts.googleapis.co... [10:23:32] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9878816 (10hashar) The element is a `` which has for style: ` lang=css :host { font-family: var(--icon-font-family, 'Material Symbols Outlined'); }... [10:28:56] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9878869 (10Lucas_Werkmeister_WMDE) But the latest font file seems to have the “up” version too: `lang=html,name=(same HTML only with the src URL changed),lines=5 10Phabricator: Create herald rules for #User-Zppix - https://phabricator.wikimedia.org/T164830#9878881 (10Aklapper) @Zppix: No reply; so I disabled H233 for now. [10:35:24] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155 (10Marostegui) 03NEW [10:35:43] 10Phabricator (Upstream), 10Release-Engineering-Team (Radar), 07Developer Productivity, 07Upstream: Add Open Graph support to Phabricator Maniphest Tasks to have link preview on Telegram, Slack, and other messaging apps - https://phabricator.wikimedia.org/T288117#9878907 (10Aklapper) 05Open→03Stalled [10:35:47] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9878908 (10Marostegui) [10:35:47] 10Phabricator (Upstream), 07Upstream: After "Log In to Comment", go back to previous page instead of Phab home page - https://phabricator.wikimedia.org/T132335#9878909 (10Aklapper) 05Open→03Stalled [10:37:59] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9878939 (10Marostegui) [10:40:31] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9878954 (10Lucas_Werkmeister_WMDE) FWIW, the underlying patchdemo failure is tracked at https://gitlab.wikimedia.org/repos/ci-tools/patchdemo/-/issues/607. [11:20:07] Project beta-update-databases-eqiad build #76647: 04STILL FAILING in 6.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76647/ [11:38:18] (03approved) 10jnuche: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 (owner: 10dancy) [11:45:50] 06Release-Engineering-Team, 06Infrastructure-Foundations, 06serviceops, 13Patch-For-Review: Deprecate buster-backports - https://phabricator.wikimedia.org/T362518#9879216 (10Clement_Goubert) Deleted `docker-registry.wikimedia.org/wikimedia/analytics-datahub` [12:30:18] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879400 (10hashar) The `lib/fonts/material-icons.woff2` has NOT changed with the upgrade and is still the same in master. Looking at [[ https://gerrit-review.googlesource.com/ | upstream Ger... [12:53:17] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879512 (10Lucas_Werkmeister_WMDE) So, IIUC, upstream should update the font file so it works on installs that don’t `useGoogleFonts`? (Whereas Google’s own install uses Google fonts and ther... [12:59:12] 10GitLab, 06collaboration-services, 06Infrastructure-Foundations, 06serviceops, 13Patch-For-Review: Container image reports in debmonitor are broken - https://phabricator.wikimedia.org/T348876#9879555 (10elukey) [13:03:09] 10Gerrit: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879611 (10hashar) > So, IIUC, upstream should update the font file so it works on installs that don’t useGoogleFonts? (Whereas Google’s own install uses Google fonts and therefore they didn’... [13:08:17] 10GitLab, 06collaboration-services, 13Patch-For-Review: Create an SSH blackbox test for GitLab - https://phabricator.wikimedia.org/T367021#9879639 (10Jelto) This seemed to be a puppet dependency issue and not a SSH issue. I checked the sshd on all instances and IPv6 worked on both replicas but not on product... [13:20:07] Project beta-update-databases-eqiad build #76649: 04STILL FAILING in 6.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76649/ [13:25:24] 06Release-Engineering-Team, 06Infrastructure-Foundations, 06serviceops, 13Patch-For-Review: Deprecate buster-backports - https://phabricator.wikimedia.org/T362518#9879702 (10Clement_Goubert) Deleted `docker-registry.wikimedia.org/wikimedia/mediawiki-services-similar-users` following resolution of T345274 [13:28:12] 10Gerrit, 07Upstream: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879717 (10Lucas_Werkmeister_WMDE) [13:51:29] 10Gerrit, 07Upstream: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879851 (10hashar) a:03hashar It took me a while to test locally with Gerrit 3.8 / 3.9 and my test case above + figuring out how the Google font API works, but eventually I ha... [13:58:13] 10Gerrit, 07Upstream: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879911 (10Paladox) Looks like the repo is now https://github.com/googlefonts/roboto-classic and not https://github.com/google/roboto [14:07:24] 10Gerrit, 07Upstream: "Collapse" link on add/edit reviewers screen is showing weird icons - https://phabricator.wikimedia.org/T367135#9879970 (10Lucas_Werkmeister_WMDE) >>! In T367135#9879851, @hashar wrote: > It took me a while to test locally with Gerrit 3.8 / 3.9 and my test case above + figuring out how th... [14:20:07] Project beta-update-databases-eqiad build #76650: 04STILL FAILING in 6.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76650/ [14:22:24] (03update) 10dancy: README.md: Say how to create a new release [repos/releng/reggie] - 10https://gitlab.wikimedia.org/repos/releng/reggie/-/merge_requests/86 [14:22:28] (03update) 10dancy: README.md: Say how to create a new release [repos/releng/reggie] - 10https://gitlab.wikimedia.org/repos/releng/reggie/-/merge_requests/86 [14:22:46] (03update) 10dancy: README.md: Say how to create a new release [repos/releng/reggie] - 10https://gitlab.wikimedia.org/repos/releng/reggie/-/merge_requests/86 [14:24:01] (03merge) 10dancy: README.md: Say how to create a new release [repos/releng/reggie] - 10https://gitlab.wikimedia.org/repos/releng/reggie/-/merge_requests/86 [14:39:44] (03update) 10dancy: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 [14:39:50] (03update) 10dancy: deploy-local: Remove "symlink is current" check [repos/releng/scap] (master-Iea957202dd45dad29b77d7d9fafd7ee44485704a) - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/352 (https://phabricator.wikimedia.org/T342162) [14:39:54] (03update) 10dancy: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 [14:40:56] (03update) 10dancy: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 [14:45:34] (03update) 10dancy: deploy-local: Remove "symlink is current" check [repos/releng/scap] (master-Iea957202dd45dad29b77d7d9fafd7ee44485704a) - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/352 (https://phabricator.wikimedia.org/T342162) [14:45:39] (03update) 10dancy: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 [14:47:53] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9880147 (10Dzahn) The other issue seems to be that this should only happen for MediaWiki changes but not all changes. [14:49:21] (03merge) 10dancy: utils.py: write_file_if_needed, temp_to_permanent_file, get_umask [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/353 [14:49:23] 10GitLab, 06collaboration-services, 13Patch-For-Review: Create an SSH blackbox test for GitLab - https://phabricator.wikimedia.org/T367021#9880167 (10Dzahn) Ah yea, this isn't the first time this has happened (with other services on other machines). The pattern seems to be: - puppet installs a package and s... [14:49:24] (03update) 10dancy: deploy-local: Remove "symlink is current" check [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/352 (https://phabricator.wikimedia.org/T342162) [15:24:21] Project beta-update-databases-eqiad build #76651: 04STILL FAILING in 4 min 21 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76651/ [15:26:58] Project beta-scap-sync-world build #158858: 04FAILURE in 31 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158858/ [15:31:31] Yippee, build fixed! [15:31:31] Project beta-scap-sync-world build #158859: 09FIXED in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158859/ [15:55:06] Project beta-code-update-eqiad build #499716: 04FAILURE in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/499716/ [16:05:20] Yippee, build fixed! [16:05:21] Project beta-code-update-eqiad build #499717: 09FIXED in 2 min 20 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/499717/ [16:20:05] Project beta-update-databases-eqiad build #76652: 04STILL FAILING in 4.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76652/ [16:22:22] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.43.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T361403#9880705 (10Aklapper) Thanks. Issues existing last week are no train blockers for this week, basically. [16:23:10] 10GitLab (Project Migration), 06Release-Engineering-Team: Create new GitLab project group: wikitechma - https://phabricator.wikimedia.org/T360380#9880740 (10Aklapper) @Bisel91: Hi! You assigned this task to yourself a while ago. Could you maybe share an update? Do you still plan to work on this task? [16:23:31] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for jonkolbert - https://phabricator.wikimedia.org/T358714#9880744 (10Aklapper) 05Stalled→03Declined Unfortunately closing this Phabricator task as no further information has been provided. @kolbert: After yo... [16:53:01] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.43.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T361403#9880961 (10brennen) To expand on this, we only block the train over the weekend if it can't be avoided, and rollbacks on Mondays are only done in very... [16:53:26] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.43.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T361403#9880964 (10brennen) [17:00:00] 06Release-Engineering-Team, 06Quality-and-Test-Engineering-Team, 10Temporary accounts, 06Trust and Safety Product Team: Temp accounts deployment and the release train - https://phabricator.wikimedia.org/T355882#9880989 (10Tchanders) 05Open→03Resolved a:03Tchanders [17:20:06] Project beta-update-databases-eqiad build #76653: 04STILL FAILING in 5.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76653/ [17:26:13] Project beta-scap-sync-world build #158870: 04FAILURE in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158870/ [17:35:05] 10WikimediaDebug, 10Tool-schedule-deployment: Integrate schedule-deployment with WikimediaDebug - https://phabricator.wikimedia.org/T367213 (10LucasWerkmeister) 03NEW [17:36:26] Yippee, build fixed! [17:36:26] Project beta-scap-sync-world build #158871: 09FIXED in 1 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/158871/ [17:44:54] 06Release-Engineering-Team, 10Data-Engineering (Q4 2024 April 1st - June 30th), 07Spike: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS - https://phabricator.wikimedia.org/T360968#9881281 (10mforns) I agree. I think we should support `current` as we... [18:02:40] 10WikimediaDebug, 10Tool-schedule-deployment: Integrate schedule-deployment with WikimediaDebug - https://phabricator.wikimedia.org/T367213#9881386 (10bd808) I believe this would need to work by adding something to the WikimediaDebug extension that changes what is rendered on the page. There are [[https://stac... [18:03:05] James_F: is there a checklist somewhere I can copy for undeploying an extension from WMF wikis? I just disabled OpenStackManager, but it's still branched etc. [18:04:28] there's https://phabricator.wikimedia.org/maniphest/task/edit/form/33/ but I assume that's missing some wikimedia-specific steps [18:05:32] I think you could copy from tickets like this: https://phabricator.wikimedia.org/T252744 [18:05:38] (the checkboxes) [18:08:59] mutante: I think that's the same form I linked that I think is missing all the wmf train specific bits? [18:09:31] like I know that for that I need to update wmf-config/extension-list and then mediawiki/tools/release.git, but it'd be nice if there was some doc I could confirm against that I'm not forgetting anything [18:10:31] taavi: In my head and copying from previous tasks, yeah. [18:10:47] Let me find the last task. [18:11:51] taavi: https://phabricator.wikimedia.org/T253216 was I think the last prod undeploy. [18:12:26] taavi: *nod*, understood, yea [18:12:35] James_F: ok, seems like I'm not forgetting anything then. thanks. [18:13:16] https://www.mediawiki.org/wiki/Wikimedia_extension_deployments lists OSM as being on track for removal since March 2017. [18:13:31] Second-longest planned removal, following ShortUrl since July 2015. [18:13:40] So, in short, thank you so much for your work. :-) [18:14:09] removing OpenStackManager from wikitech is kind of historic [18:14:14] Indeed. [18:20:06] Project beta-update-databases-eqiad build #76654: 04STILL FAILING in 5.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76654/ [18:26:18] 10Gerrit, 06Release-Engineering-Team: Delete All-Projects-In-Phabricator.git Gerrit project - https://phabricator.wikimedia.org/T355070#9881615 (10Aklapper) I propose to go to https://gerrit.wikimedia.org/r/admin/repos/All-Projects-In-Phabricator,commands and click "Delete Project", assuming this action has ze... [18:26:34] 10Gerrit, 06Release-Engineering-Team: Delete All-Projects-In-Phabricator.git Gerrit project - https://phabricator.wikimedia.org/T355070#9881627 (10Aklapper) [18:27:35] (03open) 10taavi: make-release: Stop branching OSM [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/80 (https://phabricator.wikimedia.org/T161553) [18:30:23] James_F: now if we could do the same for LdapAuthentication :-) [18:30:29] taavi: Ha, well. [18:36:07] (03approved) 10jforrester: make-release: Stop branching OSM [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/80 (https://phabricator.wikimedia.org/T161553) (owner: 10taavi) [19:16:39] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9881882 (10Tacsipacsi) It should happen for changes to [[https://gitlab.wikimedia.org/repos/ci-tools/patchdemo/-/blob/master/repository-lists/all.txt|all repos Patch De... [19:20:06] Project beta-update-databases-eqiad build #76655: 04STILL FAILING in 5.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76655/ [19:25:06] (03CR) 10Ottomata: [C:03+2] Edit Repo Config [eventgate-wikimedia] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1040083 (owner: 10Snwachukwu) [19:25:10] (03CR) 10Ottomata: [V:03+2 C:03+2] Edit Repo Config [eventgate-wikimedia] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1040083 (owner: 10Snwachukwu) [19:25:17] (03CR) 10Ottomata: [C:03+2] Edit Repo Config [services/eventstreams] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1040082 (owner: 10Snwachukwu) [19:25:18] (03CR) 10Ottomata: [V:03+2 C:03+2] Edit Repo Config [services/eventstreams] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1040082 (owner: 10Snwachukwu) [19:44:37] hi, i'm having a weird issue with gerrit. when i try to pull/push using git, i get the error: "Received disconnect from 2620:0:861:2:208:80:154:151 port 29418:12: Too many concurrent connections (8) - max. allowed: 8" [19:45:08] this started on 6 June, and on that day mutante temporarily helped me by manually closing the connections. but it happened again on the weekend (eventually it went away on its own, presumably there is some very long timeout), and again right now. [19:45:29] i have no idea what's causing it, i'm definitely not opening connections deliberately. help in debugging would be appreciated [19:46:03] (and in the meantime, i will use https transport instead of ssh, but that's inconvenient, since git-review doesn't work with it) [20:20:05] Project beta-update-databases-eqiad build #76656: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76656/ [20:49:08] MatmaRex: our Gerrit has `auth.gitBasicAuthPolicy = HTTP_LDAP` so you can authenticate using a generated password at https://gerrit.wikimedia.org/r/settings/#HTTPCredentials or using the LDAP password [20:51:48] hashar: thanks, yeah, i'm using that. i just can't use git-review because it only works over ssh. [20:51:58] git-review should work over https :D [20:52:06] 10Gerrit, 06Release-Engineering-Team: Gerrit error: Error while fetching results for wm-patch-demo - https://phabricator.wikimedia.org/T367155#9882210 (10Dzahn) Thanks for the clarification. As of today I can confirm it happened in repo `operations/puppet` but that is not on the list above. [20:52:13] huh, really [20:52:20] though maybe it needs a not straight forward hack I can't remember [20:52:41] else well you can `git fetch origin refs/changes/YY/XXXXXYY/ZZ` [20:52:46] maybe i have an old version. 2.3.1 [20:52:56] or maybe I misremember [20:58:04] (03open) 10catrope: releases: Bump Codex to v1.7.0 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T367062) [20:58:37] MatmaRex: in the repo `.gitreview` or in `~/.config/git-review/git-review.conf` you can set `scheme=https` [20:58:58] (03update) 10catrope: releases: Bump Codex to v1.7.0 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T367062) [20:59:00] or set the push url to https [20:59:04] it might just work [20:59:19] as for the error message you have, that is cause the bot/script has too many concurrent ssh connections [21:01:02] MatmaRex: As far as Gerrit is concerned, you have 8 open ssh connections to it. Have you attempted to locate them? `ps uaxwww | grep ssh` on your systems. [21:01:57] !log gerrit: closing stall ssh connections for Matmarex using `ssh -p 29418 hashar@gerrit.wikimedia.org gerrit close-connection` [21:01:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:02:06] dancy: +1 [21:02:09] i'm on windows, so no `ps`, but there are nossh processes running. i restarted my device in the meantime to be sure [21:02:15] no ssh* [21:02:23] Windows.. sigh. [21:02:28] :) [21:02:38] :old-man-yell-at-windows: [21:02:43] it does make debugging issues like this more interesting, yes [21:02:54] it is not the first people have left over ssh connections [21:03:07] it happened after the hackathon [21:03:14] Also Windows users? [21:03:14] so maybe some session time to live is not properly set [21:03:18] maybe [21:03:20] MatmaRex: set a password in the user profile once.. then you can push over https [21:03:35] I assumed that after they have left the hackathon country/city/venue, they must have terminated their side of the connection [21:03:35] is this related to the event recently when I killed some connections for you? [21:03:45] (i'm using the built-in windows version of openssh, fwiw) [21:04:09] mutante: yeah, the issue is back, i'm hoping someone has an idea what causes it. i am using the https option in the meantime [21:04:16] AH [21:04:19] I am so good [21:04:23] :D [21:04:24] at capturing stuff [21:04:25] hashar: I used "gerrit close-connection" to fix that [21:04:31] * hashar self promotes to SCTO [21:04:51] https://phabricator.wikimedia.org/T338810 [21:05:29] damn time passes SO fast [21:05:31] I ran " ssh dzahn@gerrit.wikimedia.org -p 29418 gerrit show-connections" but this time nothing from you, MatmaRex [21:05:38] that is from last year hackathon, almost exactly one year ago to the day [21:05:51] the interesting bit is https://phabricator.wikimedia.org/T338810#8923609 [21:06:15] which is that lucas had a bunch of stall ssh connections all coming from an IP allocated in Athens, Greece [21:06:28] while he was back to Germany or whatever [21:06:38] it was the same here, like last week [21:06:42] and those connections have been around for more a couple weeks [21:06:46] but not now.. unless they already got killed [21:06:56] so something is off in the sshd timeout / tcp timeout [21:07:17] mutante: hashar ran close-connection a short while ago [21:07:21] is the common thing that the client is on Windows? [21:07:28] no clue [21:07:31] dancy: ok, got it [21:07:43] it seems like it could be Windows-related [21:07:50] i don't think lucas uses windows [21:07:53] but my guess is the Gerrit ssh daemon should probably terminate them [21:07:59] I'm always ready to blame Windows. [21:08:06] ^ [21:08:21] just saying because it happens but rarely [21:08:27] not like every day for many users [21:08:32] then paladox had `ps` on Windows. I swear [21:08:49] and whatever magic service pack that could be installed to make it POSIX compliant [21:08:55] Windows has hat Linux subsystem nowadays, it's all weird :) [21:09:02] Nod... need to know what `netstat` has to say on the client side [21:09:03] OR MAYBE I AM ACTUALLY DREAMING [21:09:13] you can install all of the usual unixy tools on windows [21:09:30] powershell? [21:09:31] i have a `ps` too, actually. but it only lists itself :) [21:09:39] PROCESS something in powershell [21:10:12] Anyway, I would have liked to have looked at the tcpdumps before the connections were closed. :-) [21:10:20] Get-process https://www.tutorialspoint.com/how-to-get-all-the-processes-on-the-local-computer-with-get-process-command-using-powershell [21:10:25] i'm sure i can open some more of them for you [21:10:41] Also, it's annoying that we still can't enable TCP keepalives (the mechanism designed to deal w/ such problems). [21:10:50] dancy: sorry I had closed the connections when I have seen your message :\ But last time the TCP connections were established on the gerrit side [21:10:51] if you're feeling bored and want to debug, just tell me what to do [21:10:57] Keep-not-alive [21:11:11] nod.. I saw that they were in ESTABLISHED state. [21:12:19] maybe I debugged something similar recently [21:12:22] "Dead sessions in apache mina" .. asked 12 years ago [21:12:30] Apache Mina is the sshd here [21:12:49] actually, is this a problem that definitely happens somewhere in the normal software on my side? or is it possible that it's something stupid done by my ISP, or some drivers/hardware issue? [21:13:07] (03merge) 10egardner: releases: Bump Codex to v1.7.0 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T367062) (owner: 10catrope) [21:13:37] "Sometimes, usually when the connection is terminated from client side brutally i.e, power cable unplugged or any other unusual shutdown or some problem with network, it is not removed or closed at server side. It remain there, in idle state, for I don,t know how long( may be forever). " [21:13:53] AH I KNEW [21:13:56] I had something more recent [21:13:58] https://phabricator.wikimedia.org/T365604 ! [21:16:21] upstream MINA docs claim that even if the client ends the connection in a non-standard way that "then the application can decide to close the session. Otherwise, the session will be closed eventually when the TCP timeout will be reached (it can take hours…)." [21:17:10] "Idle time" can be configured [21:17:33] https://mina.apache.org/mina-project/userguide/ch4-session/ch4.1-session-configuration.html [21:17:34] (also, i should mention that i still experience this problem occasionally: https://phabricator.wikimedia.org/T263293 no idea if it might be related, but it's another unusual thing that affects me) [21:17:59] default value: INFINITE ! [21:18:24] for all 3 IdleTime* parameters [21:20:20] Project beta-update-databases-eqiad build #76657: 04STILL FAILING in 19 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76657/ [21:20:26] MatmaRex: I guess it needs a combination of some problem on your side/ISP/networking that causes it to not properly close the connection combined with the default values of the sshd so it never reaches a timeout [21:21:50] btw i just did a `git pull` over ssh. can you check if i have a stuck connection? [21:21:59] i wonder if it happens every time, or in some specific circumstances [21:22:37] One open connection from your address right now. [21:23:06] well, at least it's easy to reproduce, heh [21:23:30] That's good.. Lemme close that connection, set up tcpdump, then ask you to try again [21:23:42] sure! [21:25:21] Ready [21:26:03] a `git pull` for mediawiki/core? [21:26:25] Sure. Whatever you did last time [21:26:32] done [21:26:47] well, last time, it pulled a bunch of changes, this time there's nothing to pull. but hopefully that just makes it simpler [21:27:09] Once more please [21:27:22] done [21:29:08] and again please [21:29:25] done [21:31:09] https://www.irccloud.com/pastebin/ukrWwzNE/ [21:31:24] doh! exposing your address. [21:31:40] eh, no big deal [21:31:57] paste deleted [21:35:37] so… what was interesting about these lines? [21:36:42] So your side sends an "R" (reset) packet to terminate its side of the connection, which the gerrit side acknowledges.. but then the gerrit side never terminates. [21:37:17] Connections are usually cleanly terminated using "F" (FIN) packets.. which is notable. I'm going to compare w/ my connection. [21:37:50] dancy: if you have nice finding, you can continue on https://phabricator.wikimedia.org/T338810 which I have reopened since it has bunch of notes about ssh timeout/keepalive etc [21:37:57] interesting indeed [21:38:03] hashar: Will do. [21:38:45] from what I remember of the task I filed back on May 22 (and is now closed) is that tcp dump was showing some packets emitted by Gerrit every minutes or so with no response [21:39:05] and I think I gave up that rabbit hole and just closed the connections. I should have taken some traces [21:40:20] (for the record: my `ssh -V` = OpenSSH_for_Windows_8.1p1, LibreSSL 3.0.2) [21:48:52] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144#9882449 (10Ladsgroup) >User name "Meno25" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix... [21:49:27] Notes added to https://phabricator.wikimedia.org/T338810 [21:50:10] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144#9882454 (10Ladsgroup) Ran this: ` MariaDB [arwiki]> update abuse_filter set af_user = 72, af_actor = 70 where af_user_text = 'Meno25'; Query... [22:09:39] I haven’t used windows in so long [22:13:02] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144#9882515 (10Ladsgroup) I did some magic that involved doing stuff like this: ` $user = \MediaWiki\MediaWikiServices::getInstance()->getUserFac... [22:18:07] MatmaRex: Still around? [22:18:19] dancy: yeah [22:18:37] Can you run another pull? [22:19:03] done [22:19:09] Thanks [22:20:09] Project beta-update-databases-eqiad build #76658: 04STILL FAILING in 9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76658/ [22:29:26] MatmaRex: one more please (last of the day) [22:30:07] dancy: no problem, done. thank you for investigating [22:31:04] imagine it's for the benefit of all the enthusiastic new developers using windows, rather than one cranky guy who's just doing it to be weird ;) [22:35:51] Hey RelEng - let me know if there's a better place to flag this issue, but it seems like patches to the MW vendor repo are failing CI at the moment - from Roan: https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1041781 fails with ***Your composer.lock file is not up to date. Please update Composer dependencies before continuing.*** (which is a [22:35:51] bit bizarre given the context) and [22:35:52] @cscott’s https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1041121 fails with the same error. This seems to me to be a CI issue rather than a problem with our patches, because we followed the instructions in the README. Is this a known issue? [22:36:27] (I've also posted this in engineering-all on Slack) [22:36:37] 10Beta-Cluster-Infrastructure, 10AbuseFilter: Beta cluster fails to update database due to MigrateActorsAF maintenance script - https://phabricator.wikimedia.org/T367144#9882552 (10Ladsgroup) Now commonswiki is broken. I hope we didn't import too many abuse filters from production. [23:12:07] (03update) 10dduvall: go: Upgrade to Go 1.21 [repos/releng/blubber] - 10https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/90 [23:16:57] (03update) 10dduvall: go: Upgrade to Go 1.21 [repos/releng/blubber] - 10https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/90 [23:17:02] (03update) 10dduvall: go: Upgrade to Go 1.21 [repos/releng/blubber] - 10https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/90 [23:19:20] (03update) 10dduvall: ci: Use experimental-native-llb version of blubber [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/345 [23:20:11] Project beta-update-databases-eqiad build #76659: 04STILL FAILING in 10 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/76659/ [23:22:52] (03merge) 10dduvall: go: Upgrade to Go 1.21 [repos/releng/blubber] - 10https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/90 [23:22:54] (03update) 10dduvall: Update github.com/docker/docker dependency [repos/releng/blubber] - 10https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/89 [23:50:27] Feedback on what style of comment schedule-deployment should leave on a Gerrit change to signify that the change is scheduled for backport is welcome -- https://phabricator.wikimedia.org/T366763#9882611