[00:40:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:45:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:08:14] <_Gerges_> Hi, I have a project hosted outside Wikimedia servers. The project is a WordPress site intended for a group of Wikimedia users. I plan to enable login using OAuth 2 through Wikimedia accounts, allowing the site to obtain the user ID and email address from the user's Wikimedia account. [01:08:15] <_Gerges_> Is there any issue with doing this? I understand that there is no policy that explicitly prohibits it, but I would like to know whether there are any security concerns or other considerations that I should be aware of when using this approach. [01:10:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:15:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:40:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:55:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:10:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:15:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:55:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:05:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:11:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:16:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:51:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [06:51:59] 10Diffusion, 10Phabricator, 06collaboration-services, 10Wikidata, and 3 others: WikibaseLexeme submodule is not available - https://phabricator.wikimedia.org/T409519#11712081 (10Func) I am getting HTTP 429 instead of 403, tried setting the user agent to `git/2.53.0 (https://phabricator.wikimedia.org/p/Func... [07:16:16] gitlab needs a short maintenance reboot in 45 minutes, at 8:00 UTC [07:37:40] 10Phabricator: Headers for some Phab notification emails reference a Herald rule that is currently disabled - https://phabricator.wikimedia.org/T420162 (10A_smart_kitten) 03NEW [07:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [07:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:07:01] GitLab maintenance done [08:18:16] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712238 (10ABran-WMF) Before merging [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/... [08:24:47] 06Release-Engineering-Team (Priority Backlog 📥), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T413811#11712269 (10taavi) [08:42:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:45:02] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712311 (10ABran-WMF) the httpd config update to align httpd on ATS has also been applied to... [08:47:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:10:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:13:07] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712379 (10ABran-WMF) The httpd config update has been applied to all hosts. The CDN config... [09:13:33] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712381 (10ABran-WMF) [09:20:09] (03CR) 10Hashar: Split BrowserTests duration reports (033 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) (owner: 10Hashar) [09:20:46] (03PS4) 10Hashar: Split BrowserTests duration reports [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) [09:24:14] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712420 (10ABran-WMF) 05In progress→03Resolved Configs have been applied to the prima... [09:25:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:37:31] FIRING: [2x] ProbeDown: Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit2003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:37:37] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T420174 (10phaultfinder) 03NEW [09:40:16] 10GitLab (Project Migration), 10Wikispeech-Jobrunner, 10Wikispeech-Text-to-Speech: Move Speechoid components to Gitlab - https://phabricator.wikimedia.org/T360758#11712561 (10Viktoria_Hillerud_WMSE) [09:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:49:02] 10Continuous-Integration-Infrastructure, 07Jenkins: Verify the Jenkins Gearman plugin works under Java 21 - https://phabricator.wikimedia.org/T420178 (10hashar) 03NEW [09:49:57] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712639 (10ABran-WMF) 05Resolved→03Open this change applied to the primary instance has c... [09:57:56] Project EntitySchema-phpmetrics build #161: 04FAILURE in 37 sec: https://integration.wikimedia.org/ci/job/EntitySchema-phpmetrics/161/ [10:01:09] 10Phabricator (Upstream), 07Upstream: Headers for some Phab notification emails reference a Herald rule that is currently disabled - https://phabricator.wikimedia.org/T420162#11712729 (10Aklapper) p:05Triage→03Low [10:09:36] 10Gerrit, 06collaboration-services, 07Puppet: Edit puppet-merge to use gerrit.discovery.wmnet instead of gerrit.wikimedia.org? - https://phabricator.wikimedia.org/T420184 (10ABran-WMF) 03NEW [10:11:31] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://phabricator.wikimedia.org/T420174#11712749 (10ABran-WMF) [10:11:40] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://phabricator.wikimedia.org/T420174#11712750 (10ABran-WMF) [10:11:46] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11712751 (10ABran-WMF) [10:12:18] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://phabricator.wikimedia.org/T420174#11712764 (10ABran-WMF) 05Open→03Resolved a:03ABran-WMF this has been [[ https://phabricator.wikimedia.org/T417998#11712639 | fix... [10:22:31] RESOLVED: [2x] ProbeDown: Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit2003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:24:12] 10GitLab (Account Approval), 06Release-Engineering-Team (Doing 😎): Requesting GitLab account activation for Spandan1104 - https://phabricator.wikimedia.org/T419977#11712820 (10Aklapper) 05Open→03Resolved a:03Aklapper Cloning via HTTPS should not require a GitLab account, but here you go :) [10:32:52] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic: Gerrit: Debug connection re-use on Gerrit's httpd causing Gerrit interface to be very slow - https://phabricator.wikimedia.org/T420189 (10ABran-WMF) 03NEW [10:40:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:50:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:03:30] Project beta-code-update-eqiad build #591877: 04FAILURE in 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591877/ [11:05:53] gate-and-submit contains both PS1 and PS2 of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Score/+/1253427 now (because I submitted PS2 right as gerrit was being rebooted, so I guess Zuul missed an event) – anything I can do to fix this? [11:10:42] 10Gerrit, 06collaboration-services: gerrit: create a reboot gerrit cookbook - https://phabricator.wikimedia.org/T420194#11713007 (10ABran-WMF) p:05Triage→03Low [11:11:05] (seems to have fixed itself, nevermind) [11:11:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:15:32] Yippee, build fixed! [11:15:32] Project beta-code-update-eqiad build #591878: 09FIXED in 2 min 32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591878/ [11:16:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:37:57] hello hashar, not sure if you're the person to bother with this but I have a bit silly Gerrit challenge. Lot of prose following, apologies [12:38:51] I'm intending to reduce the amount of git submodules in Wikibase.git by merging some of those into Wikibase.git. If possible that'd happen including the git history [12:39:27] I've managed to cook relevant patches but Gerrit does not allow to have them pushed for review [12:40:08] it complains about a number of things, which some of I've been familiar (needed to temporary allow myself to push merge commits and to forge commiter identity) but some seem to be new [12:40:46] for example, I was actually push https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1084913 as a proof of concept in October. [12:41:22] When I now generate the same patch today, Gerrit does not like that commits don't have Change-Id (I think I can make it accept it temporarily) and that there are to many changes in batch at once [12:41:55] I might remember wrong, and the problem is simply my doing, but I am quite convinced I have created the October patch the very same way I do now (git commands are actually in the commit message) [12:42:31] so I figure that maybe this limit of changes in push batch is some recent additions, maybe as a way to protect gerrit from scraper nonsense etc [12:43:10] generally not knowing what I'm talking about, I'd be curious hashar et al if you think there's a way I could get 4 more of such mega merge commits up on Gerrit somehow [12:50:44] (03CR) 10Jforrester: [C:03+1] Split BrowserTests duration reports [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) (owner: 10Hashar) [13:14:40] (03PS1) 10Hashar: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep [integration/config] - 10https://gerrit.wikimedia.org/r/1253488 [13:14:53] (03PS2) 10Hashar: utils: zuul-dependencies: list deps repositories [integration/config] - 10https://gerrit.wikimedia.org/r/1253449 [13:16:08] (03CR) 10CI reject: [V:04-1] utils: zuul-dependencies: list deps repositories [integration/config] - 10https://gerrit.wikimedia.org/r/1253449 (owner: 10Hashar) [13:17:58] (03PS3) 10Hashar: utils: zuul-dependencies: list deps repositories [integration/config] - 10https://gerrit.wikimedia.org/r/1253449 [13:40:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:49:52] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11713686 (10ABran-WMF) 05Open→03Resolved closing that task, we've aligned ATS and http... [13:50:31] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic: Gerrit: Debug connection re-use on Gerrit's httpd causing Gerrit interface to be very slow - https://phabricator.wikimedia.org/T420189#11713692 (10ABran-WMF) p:05Triage→03Medium [13:50:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:29:00] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic: Gerrit: Debug connection re-use on Gerrit's httpd causing Gerrit interface to be very slow - https://phabricator.wikimedia.org/T420189#11713922 (10ABran-WMF) From https://logstash.wikimedia.org/goto/01f21e6cccb2c9c7ba4c45b422ac089b:... [14:31:33] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:35:14] Project beta-code-update-eqiad build #591898: 04FAILURE in 2 min 13 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591898/ [14:45:15] Yippee, build fixed! [14:45:15] Project beta-code-update-eqiad build #591899: 09FIXED in 2 min 15 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591899/ [14:57:21] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic: Gerrit: Debug connection re-use on Gerrit's httpd causing Gerrit interface to be very slow - https://phabricator.wikimedia.org/T420189#11714028 (10ABran-WMF) the interaction between the CDN and Gerrit did not created a burst of 5xx... [15:04:57] (03CR) 10Phedenskog: [C:03+1] "Love this! If we can get this to Tyler stats tool too that would be great." [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) (owner: 10Hashar) [15:11:08] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11714079 (10bd808) [15:11:10] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420139#11714081 (10bd808) →14Duplicate dup:03T415021 [15:12:57] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11714088 (10bd808) [15:13:00] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://phabricator.wikimedia.org/T420100#11714090 (10bd808) →14Duplicate dup:03T420034 [15:29:02] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11714169 (10Nux) >>! In T420132#11711480, @Tgr wrote: > I wonder if we could auto-add people to Trusted-Contributors based on some modest contribution criteria (like >100... [15:34:16] 06Release-Engineering-Team (Priority Backlog 📥), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T413811#11714203 (10brennen) [15:41:12] (03CR) 10Phedenskog: [C:03+1] Add devcontainer.json [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250501 (https://phabricator.wikimedia.org/T418234) (owner: 10Zfilipin) [15:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:41:38] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227 (10wmcs-alerts) 03NEW [15:46:08] 10Phabricator, 06DC-Ops, 10ops-codfw: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable - https://phabricator.wikimedia.org/T420228 (10Aklapper) 03NEW [15:51:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:56:33] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:08:58] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11714417 (10elukey) Kafka 3.7 is running on deployment-kafka-logging01 with Debian Trixie, first one of its kind! @colewhit... [16:45:19] 06Release-Engineering-Team (Priority Backlog 📥), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T413811#11714625 (10Aklapper) [16:50:07] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11714635 (10bd808) I think putting more contribution paths behind #trusted-contributors membership is an overreaction at this time. Every barrier we create to using Phabr... [16:51:20] hashar: hello. we need to reboot contint*. does it need coordination with you our your team? even for the one that is not currently the "active" one (alias behind contint.wikimedia.org)? I am asking because last time I just did it but I think remember you said please announce it. [16:52:54] huh when did irc gain typing [16:53:08] mutante: contint2002 does run jobs from time to time (pipeline lib) but it is usually idling ( https://integration.wikimedia.org/ci/computer/contint2002/ ) so I think it can just be rebooted [16:54:10] for contint1002 , that would stop the whole CI system for a while and loose all the events ( https://integration.wikimedia.org/zuul/ ) [16:54:29] hashar: thanks, I will just do that. then let's find a time for contint1002 separately [16:54:37] historically I have rebooted it during european morning when morit ask for it [16:54:49] I guess I can do it with arnaud tomorrow morning? [16:55:05] it is quieter during our mornings [16:55:07] he is asking for it now (via ticket). do you just want to take that whenever it works for you? is that easiest? [16:55:22] yeah I will do it tomorrow with him [16:55:35] thank you. so if you want to check the box on https://phabricator.wikimedia.org/T420168 or just let me know [16:56:06] for the other new contint hosts I guess they can be rebooted anytime? [16:56:07] contint2002 now so the new kernel was running for a night then [16:56:08] ack, let me know when hashar ! [16:56:12] paladox: ircv3 which wasn't supported by libera until recently [16:56:25] hashar: yea, I am just doing these. already done [16:56:30] oh nice! Thanks! [16:56:46] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11714664 (10colewhite) >>! In T420034#11714417, @elukey wrote: > @colewhite I see logstash consumer groups connecting, when... [16:56:53] paladox: https://libera.chat/news/new-and-upcoming-features-3 :) [16:58:28] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11714678 (10elukey) 05Open→03Resolved a:03elukey [16:59:13] arnaudb: we can reboot contint1002 tomorrow after the backport window, that will be around 10am [17:13:21] 10Phabricator, 06collaboration-services, 10VPS-project-Phabricator: Phabricator test project requires email verification but can't send email - https://phabricator.wikimedia.org/T388022#11714778 (10Dzahn) [17:20:18] 10Phabricator, 10Tool-phab-ban: Temporary ban feature from phab-ban to quickly response to Phabricator vandalism - https://phabricator.wikimedia.org/T420136#11714804 (10bd808) 05Declined→03Invalid [17:24:34] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Kengkong1 - https://phabricator.wikimedia.org/T419773#11714831 (10Kengkong1) Hi. I am planning to work in data-engineering's Test Kitchen. [[ https://gitlab.wikimedia.org/repos/data-engineering/test-kitchen]] Thanks. [17:29:23] 10Diffusion, 10Phabricator, 06collaboration-services, 10Wikidata, and 3 others: WikibaseLexeme submodule is not available - https://phabricator.wikimedia.org/T409519#11714854 (10Dzahn) If you are getting a 429 then you are most likely affected by the policy on user agents becoming stricter (https://foundat... [17:31:05] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11714858 (10Novem_Linguae) The attack that inspired this ticket wasnt as bad as {T198552}, but was significant. It involved about 25 accounts created over 8 hours, over t... [17:36:09] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11714879 (10Dzahn) In "ticket that isn't yours", what would be the definition of "yours"? Is a ticket "my" ticket if I wrote it, if it's assigned to me or something else... [17:38:22] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11714884 (10Novem_Linguae) I was envisioning "your ticket" = "you created it", but am of course open to discussion. [18:17:41] 10Phabricator, 06collaboration-services, 10VPS-project-Phabricator: Phabricator test project requires email verification but can't send email - https://phabricator.wikimedia.org/T388022#11715141 (10Aklapper) @Dzahn: Hmmm how does this affect the production instance? Or what did you have in mind by adding the... [18:30:17] 06Release-Engineering-Team, 10ChangeProp, 06Data-Engineering, 10EventStreams, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11715215 (10Ottomata) [18:30:49] Project mediawiki-core-doxygen build #18627: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18627/ [18:42:16] Yippee, build fixed! [18:42:17] Project mediawiki-core-doxygen build #18628: 09FIXED in 11 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18628/ [19:11:57] 10Phabricator, 06DC-Ops, 10ops-codfw, 06SRE: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable - https://phabricator.wikimedia.org/T420228#11715411 (10Jhancock.wm) @Aklapper this notifies on the physical server if something goes wrong. like if a power su... [19:15:44] 10Phabricator, 06collaboration-services, 10VPS-project-Phabricator: Phabricator test project requires email verification but can't send email - https://phabricator.wikimedia.org/T388022#11715419 (10Dzahn) @Aklapper The last question above was about a configuration change. Configuration changes affect all ins... [19:21:12] 10Phabricator, 06DC-Ops, 10ops-codfw, 06SRE: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable - https://phabricator.wikimedia.org/T420228#11715437 (10Dzahn) "management controller unavailable" sounds like the management console/DRAC is not working norma... [19:21:41] 10Phabricator, 06collaboration-services, 06DC-Ops, 10ops-codfw: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable - https://phabricator.wikimedia.org/T420228#11715438 (10Dzahn) [20:10:20] 06Release-Engineering-Team, 10ChangeProp, 06Data-Engineering, 10EventStreams, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11715627 (10Ottomata) [20:59:55] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, 13Patch-For-Review: ATS: align ATS and Gerrit Apache timeouts to reenable connection re-use - https://phabricator.wikimedia.org/T417998#11715846 (10Dzahn) Cool! Since timeouts between ATS and httpd have been aligned; now woul... [22:36:35] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716228 (10bd808) I am going to leave {T420227} open rather than merging here because apparent... [22:37:05] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716231 (10bd808) [22:37:06] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716232 (10bd808) [22:38:11] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716236 (10bd808) [22:38:12] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716237 (10bd808) [22:38:17] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716238 (10bd808) [22:38:19] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716239 (10bd808) [22:39:12] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716243 (10bd808) 05Open→03Stalled Leaving open but marking as stalled on {T415021} in the hope that we get less Phab spam about the overloads making monitoring flap... [22:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:41:40] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420284 (10wmcs-alerts) 03NEW [22:45:46] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716276 (10bd808) Boo. Leaving it open did not stop {T420284} from being filed. [22:46:05] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420227#11716282 (10bd808) →14Duplicate dup:03T415021 [22:46:10] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716284 (10bd808) [22:46:14] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420284#11716286 (10bd808) →14Duplicate dup:03T415021 [22:46:20] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Cassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11716288 (10bd808) [22:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:47:24] that flapping is way past old at this point. I guess I know what my main task is going to be tomorrow. [22:53:10] 10Phabricator: Make it so that only Trusted-Contributors can edit things on other people's tickets - https://phabricator.wikimedia.org/T420132#11716298 (10bd808) >>! In T420132#11714169, @Nux wrote: > As I understand it, NDAs are stored within Phab. They were once, but the canonical list now lives elsewhere (as... [23:41:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:41:39] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420287 (10wmcs-alerts) 03NEW [23:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown