[04:31:25] 06Project-Admins: Create project tag for WdTmCollab - https://phabricator.wikimedia.org/T391140#10716324 (10Eugene233) >>! In T391140#10715855, @Aklapper wrote: > @Eugene233: Is there a reason why that [workboard](https://phabricator.wikimedia.org/project/board/7831/) has columns "Incoming features" and "Inc... [06:23:26] Hi! I'm planning to deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikistories/+/1134400 in a bit. Any concerns? [07:57:59] 10Beta-Cluster-Infrastructure, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 13Patch-For-Review: Beta Cluster orchestrator / evaluator broken, blocking WikiLambda CI (and use of Beta Cluster) - https://phabricator.wikimedia.org/T374242#10716550 (10DSantamaria) 05Open→03In progress [08:11:12] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 06DBA: Prepare a database test for m3 - https://phabricator.wikimedia.org/T390034#10716583 (10Marostegui) {P74612} [08:33:34] 06Project-Admins: Propose to add "stack trace requested" column to #wikimedia-production-error - https://phabricator.wikimedia.org/T391206#10716644 (10A_smart_kitten) > The board columns are by date; "stack trace requested" is not a date. :) @Aklapper in that case, could we re-open to discuss my alternative... [08:53:12] (03CR) 10Kosta Harlan: [C:03+1] zuul: Test IPInfo with MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/1134278 (https://phabricator.wikimedia.org/T345639) (owner: 10Máté Szabó) [09:05:20] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 06DBA: Prepare a database test for m3 - https://phabricator.wikimedia.org/T390034#10716828 (10Marostegui) {P74614} [09:10:58] 06Project-Admins: Propose to add "stack trace requested" column to #wikimedia-production-error - https://phabricator.wikimedia.org/T391206#10716840 (10Aklapper) Which actual underlying problem do folks want to solve in this task?I don't yet understand which //underlying// problem is solved by creating additi... [09:28:00] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥), 10Wikimedia-Phabricator-Extensions: Remove custom "Expert Mode" - https://phabricator.wikimedia.org/T351289#10716892 (10Aklapper) >>! In T351289#10667625, @Jdforrester-WMF wrote: > I'm really sad to lose the controls this gives me via Parent ID... [09:44:08] (03open) 10lucaswerkmeister-wmde: Update @wmde/eslint-config-wikimedia-typescript [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/71 [09:56:55] (03update) 10lucaswerkmeister-wmde: Update @wmde/eslint-config-wikimedia-typescript [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/71 [09:57:07] (03update) 10lucaswerkmeister-wmde: releases: Bump @wmde/eslint-config-wikimedia-typescript to 0.2.13 [repos/ci-tools/libup-config] - 10https://gitlab.wikimedia.org/repos/ci-tools/libup-config/-/merge_requests/71 [10:05:17] 06Project-Admins: Propose to add "stack trace requested" column to #wikimedia-production-error - https://phabricator.wikimedia.org/T391206#10717020 (10A_smart_kitten) >>! In T391206#10716840, @Aklapper wrote: > Which actual underlying problem do folks want to solve in this task?I don't yet understand which /... [10:25:08] 06Project-Admins: Propose to add "stack trace requested" column to #wikimedia-production-error - https://phabricator.wikimedia.org/T391206#10717057 (10Aklapper) Ah, that makes sense, thanks! I see a social problem here (ideally folks who can should look up and add stacktraces), I don't think more tags or doc... [10:48:42] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 06DBA: Prepare a database test for m3 - https://phabricator.wikimedia.org/T390034#10717117 (10Marostegui) @brennen @Dzahn I have set up the host. The host is db1176, it is replicating from m3 master so you can test things with real data o... [11:32:30] 10Diffusion, 14Phabricator (2024-10-22), 10Release-Engineering-Team (Yak Shaving 🐃🪒), 07Developer Productivity: Reduce task notification noise/frequency of changes to associated open patchsets - https://phabricator.wikimedia.org/T143162#10717216 (10Aklapper) 05Resolved→03Open a:05Aklapper→03None Hm... [11:35:54] 10Diffusion, 10Phabricator, 10Release-Engineering-Team (Yak Shaving 🐃🪒), 07Developer Productivity: Reduce task notification noise/frequency of changes to associated open patchsets - https://phabricator.wikimedia.org/T143162#10717223 (10Aklapper) [12:04:06] 06Project-Admins: Propose to add "stack trace requested" column to #wikimedia-production-error - https://phabricator.wikimedia.org/T391206#10717309 (10A_smart_kitten) >>! In T391206#10717057, @Aklapper wrote: > Ah, that makes sense, thanks! I see a social problem here (ideally folks who can should look up an... [13:32:11] 10GitLab (Pipeline Services Migration🐤), 06collaboration-services, 10Wikidata, 10Wikidata Query UI, and 2 others: move query.wikidata.org to kubernetes - https://phabricator.wikimedia.org/T350793#10717725 (10Jelto) I found the issue in the `query-service` ingress config that was causing the 404s. The GUI i... [13:41:18] (03open) 10dcausse: Allow restarting multiple services when require_valid_service is True [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/745 [13:43:57] (03update) 10dcausse: Allow restarting multiple services when require_valid_service is True [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/745 [13:48:37] 06Release-Engineering-Team, 10Scap (SpiderPig 🕸️), 10logspam-watch, 10observability: Add a log view to SpiderPig - https://phabricator.wikimedia.org/T391005#10717788 (10fgiunchedi) Thank you for reaching out ! From a cursory look the idea of fetching and stashing (hah!) logs a-la `logspam-watch` seems sens... [13:58:24] 10Phabricator, 06Security-Team: Audit Phabricator security policies and groups membership - https://phabricator.wikimedia.org/T391150#10717842 (10sbassett) >>! In T391150#10714904, @Aklapper wrote: > because non-public T304792 has not seen momentum. I feel like the #security-team, especially given [[ https://... [14:42:03] hashar: want to chuck a CR+2 at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/1130568 before the branch cut, maybe? 😇 [14:42:27] (or you +1 the small change I made to it and then I +2 it, if you prefer ^^) [14:50:52] Does Scap invoke php/mw from shell on the deploy1004 host? Or are those prep steps and pre-deploy checks fully inside a container nowadays? [14:51:20] I ask in relation to whether or not upgrading PHP on the deploy host is a hard blocked for MW fatalling unconditionally on PHP7. [14:51:31] cc dancy [14:52:03] Krinkle: All mwscripts are run inside a container on the deploy servers now. [14:52:33] including the `echo 1 | eval.php` check (or wahtever form that has nowadays)? [14:52:39] yep [14:52:42] ok [14:53:04] I guess mwscript does still exist on deploy but people should presumably use mwmaint host for that instead [14:53:19] That's right. [15:22:17] 10Phabricator, 06Security-Team: Audit Phabricator security policies and groups membership - https://phabricator.wikimedia.org/T391150#10718241 (10Grunny) Just making a note pre-emptively about making sure my access for Fandom security release management (and checking for reports in our bug bounty program with... [15:32:24] (03update) 10dancy: Allow restarting multiple services when require_valid_service is True [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/745 (owner: 10dcausse) [15:44:40] (03approved) 10thcipriani: Allow restarting multiple services when require_valid_service is True [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/745 (owner: 10dcausse) [15:50:01] (03merge) 10thcipriani: Allow restarting multiple services when require_valid_service is True [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/745 (owner: 10dcausse) [15:55:17] 10Phabricator, 06Security-Team: Audit Phabricator security policies and groups membership - https://phabricator.wikimedia.org/T391150#10718448 (10sbassett) >>! In T391150#10718241, @Grunny wrote: > Just making a note pre-emptively about making sure my access for Fandom security release management (and checking... [16:00:08] 10Phabricator, 06Security-Team: Audit Phabricator security policies and groups membership - https://phabricator.wikimedia.org/T391150#10718459 (10Dzahn) Would it be more effective to have an offboarding workflow where HR/legal informs someone that staff has left and needs to be removed from the WMF-NDA group (... [16:10:19] 10Phabricator, 06Security-Team: Audit Phabricator security policies and groups membership - https://phabricator.wikimedia.org/T391150#10718490 (10sbassett) >>! In T391150#10718459, @Dzahn wrote: > Would it be more effective to have an offboarding workflow where HR/legal informs someone that staff has left and... [16:17:02] (03open) 10dancy: Release 4.150.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/746 [16:18:10] (03merge) 10dancy: Release 4.150.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/746 [16:19:45] 10Gerrit, 06Release-Engineering-Team, 06Security-Team, 07SecTeam-Processed, 07Security: Prevent Gerrit changes with top-level .patch files from being pushed - https://phabricator.wikimedia.org/T391269 (10Lucas_Werkmeister_WMDE) 03NEW [16:20:00] 10Gerrit, 06Release-Engineering-Team, 06Security-Team, 07SecTeam-Processed, 07Security: Develop a means for preventing developers from accidentally pushing embargoed security patches to gerrit - https://phabricator.wikimedia.org/T388247#10718549 (10Lucas_Werkmeister_WMDE) Alright, I made a separate task... [16:47:06] !log `sudo /usr/local/sbin/clean-stale-puppet-certs --clean` on deployment-puppetserver-1 to clean up dangling certs for deployment-elastic{09,10,11} [16:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:50:02] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272 (10bd808) 03NEW [16:50:38] (03update) 10dancy: spiderpig: manually split vendor chunks [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/744 (owner: 10bd808) [16:50:39] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272#10718668 (10bd808) 05Open→03In progress a:03bd808 [16:52:34] (03merge) 10dancy: spiderpig: manually split vendor chunks [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/744 (owner: 10bd808) [16:53:31] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272#10718673 (10bd808) Lots and lots of log spam in `/var/log/{messages,syslog,user.log}` `lang=shell-session root@deployment-webperf21:/var/log# du -sh *|sort -h... [16:54:55] (03open) 10dancy: Release 4.151.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/747 [16:56:00] (03merge) 10dancy: Release 4.151.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/747 [16:56:48] !log `rm /var/log/user.log.1` on deployment-webperf21 (T391272) [16:56:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:56:50] T391272: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272 [16:56:53] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272#10718676 (10bd808) It looks like `Mar 28 00:01:07 deployment-webperf21 navtiming[2564521]: kafka.errors.NoBrokersAvailable: NoBrokersAvailable` may be the pro... [16:58:42] !log `puppet agent -tv` to catch up with missed puppet runs on deployment-webperf21 (T391272) [16:58:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:05:50] 10Scap: Selecting N for "backport the change" doesn't seem to work/exit - https://phabricator.wikimedia.org/T380924#10718682 (10dancy) 05Open→03Resolved a:03dancy Deployed via scap 4.151.0 [17:07:17] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 06DBA: Prepare a database test for m3 - https://phabricator.wikimedia.org/T390034#10718688 (10Dzahn) Thank you @Marostegui ! I can confirm I can connect from phab1005 to db1176 to a phab DB using the password on cumin1002. I think we can... [17:08:15] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection to kafka fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273 (10bd808) 03NEW [17:11:28] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718705 (10bd808) [17:13:52] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272#10718714 (10bd808) The log spam in the `messages.1` file was so repetitive that `gzip -9 messages.1` turned a 2.3G input into a 30M output! {T391273} is the... [17:15:20] !log Reboot deployment-webperf21 (T391272) [17:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:15:22] T391272: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272 [17:20:04] !log `service navtiming stop` to halt "Unhandled exception in main loop, restarting consumer" crash loop (T391272) [17:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:22:56] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718747 (10Ottomata) Possibly related to {T381593}? Puppet looks like it should do the right thing tho, so I'm not sure. [17:25:34] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 06DBA, 13Patch-For-Review: Prepare a database test for m3 - https://phabricator.wikimedia.org/T390034#10718752 (10Dzahn) 05Open→03Resolved @brennen @Aklapper I added "phab server shell users" to the host phab1005. So while there... [17:41:47] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718773 (10bd808) >>! In T391273#10718747, @Ottomata wrote: > Possibly related to {T381593}? The navtiming service on deploym... [17:44:47] 10Beta-Cluster-Infrastructure: deployment-webperf21 puppet runs crashing with `Error: No space left on device` - https://phabricator.wikimedia.org/T391272#10718778 (10bd808) 05In progress→03Resolved The disk isn't filling up now that the broken service is shutdown. Follow up should happen in {T391273} [17:47:33] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718781 (10bd808) p:05Triage→03High The `navtiming` service is currently shutdown on deployment-webperf21.deployment-prep.... [17:48:44] 10Beta-Cluster-Infrastructure, 10NavigationTiming: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718783 (10Ottomata) > Possibly related to ... Oh oops sorry I missed that this was in beta. Carry on! [18:00:27] 10Beta-Cluster-Infrastructure, 10NavigationTiming, 06SRE Observability: navtiming: Loss of Kafka connection fills multiple log files with identical stack traces - https://phabricator.wikimedia.org/T391273#10718844 (10bd808) https://www.mediawiki.org/wiki/Developers/Maintainers#Services_and_administration has... [18:56:57] 06Release-Engineering-Team, 10Scap, 06serviceops, 06SRE-OnFire, 10Sustainability (Incident Followup): Should scap be able to update helmfile-defaults when -Dbuild_mw_container_image:False ? - https://phabricator.wikimedia.org/T390531#10719044 (10dancy) [19:24:39] (03open) 10dancy: kubernetes.py: Move update_helmfile_files() into deployment phase [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/748 (https://phabricator.wikimedia.org/T390531) [19:24:39] (03update) 10dancy: kubernetes.py: Move update_helmfile_files() into deployment phase [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/748 (https://phabricator.wikimedia.org/T390531) [19:26:59] (03update) 10dancy: kubernetes.py: Move update_helmfile_files() into deployment phase [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/748 (https://phabricator.wikimedia.org/T390531) [19:36:03] 10Scap: Selecting N for "backport the change" doesn't seem to work/exit - https://phabricator.wikimedia.org/T380924#10719167 (10dancy) ` dancy@deploy1003:~$ scap backport 1097484 19:35:00 Checking whether requested changes are in a branch deployed to production and their dependencies valid... 19:35:01 Change... [20:16:38] (03open) 10dancy: exp/Makefile: Fix jib-image.tar build [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/129 [20:16:41] (03update) 10dancy: exp/Makefile: Fix jib-image.tar build [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/129 [20:17:42] (03merge) 10dancy: exp/Makefile: Fix jib-image.tar build [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/129 [20:26:34] 10Scap, 06serviceops: Migrate scap's maintenance script invocations to PHP 8.1 - https://phabricator.wikimedia.org/T390225#10719278 (10Scott_French) I was able to do some very testing in train-dev by switching `mediawiki_runtime_image` to `php8.1-fpm-multiversion-base` and deploying with scap (after clearing o... [23:52:27] (03update) 10bd808: SpiderPig: auto select first backport search match [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/731