[00:07:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [00:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [00:22:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [00:29:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [00:39:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [00:39:48] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1216221 [00:39:48] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1216221 (owner: 10TrainBranchBot) [00:45:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [00:51:17] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1216221 (owner: 10TrainBranchBot) [00:51:38] RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:00:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [01:00:55] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [01:09:55] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1216222 [01:09:55] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1216222 (owner: 10TrainBranchBot) [01:14:02] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 13m 07s) [01:14:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [01:29:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [01:34:17] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1216222 (owner: 10TrainBranchBot) [01:48:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [02:03:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.19% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:07:50] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance [02:07:58] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86444 and previous config saved to /var/cache/conftool/dbconfig/20251208-020757-ladsgroup.json [02:08:02] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [02:08:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 22.33% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [02:10:56] (03CR) 10Ladsgroup: [C:03+1] "It looks correct to me but keep in mind this extension is so unpredictable that even both are 100% sure, it's still might blow up as it ha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: 10A smart kitten) [02:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:37:33] FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:55:12] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:35:12] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:47:05] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [04:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [04:21:26] (03PS2) 10Anzx: niawiktionary: update wordmark, sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) [04:32:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) (owner: 10Anzx) [04:33:55] (03PS3) 10Anzx: shnwiki: add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) [04:34:41] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [05:02:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:02:48] FIRING: [22x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:08:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [05:10:01] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:35:01] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:56:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:01:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:02:13] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:06:58] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:11:26] PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 419479568 and 28 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [06:13:26] RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 3781448 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [06:15:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:35:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:37:33] FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:43:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [06:55:13] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:56:04] FIRING: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins [06:58:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [07:04:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [07:12:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:17:43] FIRING: [22x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:21:04] RESOLVED: MediaWikiElevatedUnknownLogins: Elevated number of failed login attempts (unknown device and IP) via mw-api-ext - TODO - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?from=now-6h&orgId=1&to=now&viewPanel=26 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiElevatedUnknownLogins [07:24:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [07:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:31:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [07:32:43] FIRING: [4x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:32:53] FIRING: [21x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:37:43] FIRING: [21x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:42:43] FIRING: [21x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:47:06] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [07:47:43] FIRING: [19x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:52:43] FIRING: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:52:48] FIRING: [16x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:57:43] RESOLVED: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:57:53] FIRING: [12x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [08:00:05] Amir1, Urbanecm, and awight: How many deployers does it take to do UTC morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T0800). [08:00:05] kostajh and anzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:18] hello [08:01:05] o/ [08:02:33] having a look at your patches [08:02:43] RESOLVED: [10x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [08:04:14] anzx: do you know if a maintenance script needs to be run after https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1216196 ? [08:04:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211660 (https://phabricator.wikimedia.org/T405586) (owner: 10Kosta Harlan) [08:04:58] yes namespacedupes.php needed to run for both [08:05:35] i will provide you script in a moment for both [08:05:56] I can see why it's needed for the other patch [08:06:14] (03Merged) 10jenkins-bot: hCaptcha: Switch enwiki to 99.9% passive mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211660 (https://phabricator.wikimedia.org/T405586) (owner: 10Kosta Harlan) [08:07:24] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1211660|hCaptcha: Switch enwiki to 99.9% passive mode (T405586)]] [08:07:28] T405586: hCaptcha editing trial deployment tracker - https://phabricator.wikimedia.org/T405586 [08:07:30] mwscript namespaceDupes.php --wiki=shnwiki --add-prefix=BROKEN --fix # T [08:12:07] kostajh: here's namespacedupes to run on both wikis https://www.irccloud.com/pastebin/0s0zBow1/ [08:12:42] anzx: thanks [08:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [08:27:11] (03PS8) 10Dpogorzelski: ml-build: define new machine name/type [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) [08:28:06] (03PS9) 10Dpogorzelski: ml-build: define new machine name/type [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) [08:28:38] (03PS18) 10Func: Test diffConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 [08:28:44] (03PS19) 10Func: Test diffConfig [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 [08:30:26] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1211660|hCaptcha: Switch enwiki to 99.9% passive mode (T405586)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:30:30] T405586: hCaptcha editing trial deployment tracker - https://phabricator.wikimedia.org/T405586 [08:33:01] !log kharlan@deploy2002 kharlan: Continuing with sync [08:33:41] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and SQL Lab for Leif WMDE - https://phabricator.wikimedia.org/T411883#11439308 (10Lena_WMDE) I can confirm that @Leif_WMDE is a product manager at Wikimedia Deutschland. [08:33:52] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and SQL Lab for Leif WMDE - https://phabricator.wikimedia.org/T411883#11439309 (10Lena_WMDE) a:05Lena_WMDE→03None [08:35:39] (03CR) 10Ayounsi: [C:03+2] inter.link: add DDoS scrubbing community to all v4 prefixes [homer/public] - 10https://gerrit.wikimedia.org/r/1214537 (https://phabricator.wikimedia.org/T407959) (owner: 10Ayounsi) [08:36:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [08:38:11] (03Merged) 10jenkins-bot: inter.link: add DDoS scrubbing community to all v4 prefixes [homer/public] - 10https://gerrit.wikimedia.org/r/1214537 (https://phabricator.wikimedia.org/T407959) (owner: 10Ayounsi) [08:38:54] (03PS20) 10Func: [DNM] Test diffConfig for the parent change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 [08:41:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [08:45:01] (03CR) 10Dpogorzelski: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [08:47:18] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211660|hCaptcha: Switch enwiki to 99.9% passive mode (T405586)]] (duration: 39m 54s) [08:47:23] T405586: hCaptcha editing trial deployment tracker - https://phabricator.wikimedia.org/T405586 [08:47:41] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750) (owner: 10Zoranzoki21) [08:48:28] anzx: ok, I'll get started on your patches [08:48:47] (03PS3) 10Anzx: niawiktionary: update wordmark, sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) [08:48:59] kostajh: ok [08:49:31] anzx: although, is another window possible for you? (Or are other deployers around?) [08:50:03] kostajh: sure i can schedule it for next window [08:50:09] anzx: thank you [08:58:01] (03CR) 10Jelto: [C:03+1] "lgtm now" [puppet] - 10https://gerrit.wikimedia.org/r/1215389 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [09:01:45] (03PS5) 10Arnaudb: gerrit: unmask service & disable backup temporarily [puppet] - 10https://gerrit.wikimedia.org/r/1196792 (https://phabricator.wikimedia.org/T387833) [09:01:50] (03PS4) 10Arnaudb: gerrit: Switchover gerrit1003 → gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1211549 (https://phabricator.wikimedia.org/T338470) [09:01:55] (03PS3) 10Urbanecm: [Growth] Enable Add Link backend on a handful of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214570 (https://phabricator.wikimedia.org/T410469) [09:01:56] (03PS4) 10Arnaudb: gerrit: re-enable backups on gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1211551 (https://phabricator.wikimedia.org/T387833) [09:02:00] (03CR) 10Urbanecm: [C:03+2] [Growth] Enable Add Link backend on a handful of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214570 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm) [09:02:04] (03PS5) 10Urbanecm: [Growth] Sort the list of Add Link wikis alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214571 (https://phabricator.wikimedia.org/T410469) [09:02:06] (03CR) 10Urbanecm: [C:03+2] [Growth] Sort the list of Add Link wikis alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214571 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm) [09:03:20] (03Merged) 10jenkins-bot: [Growth] Enable Add Link backend on a handful of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214570 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm) [09:03:23] (03Merged) 10jenkins-bot: [Growth] Sort the list of Add Link wikis alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214571 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm) [09:04:25] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1214570|[Growth] Enable Add Link backend on a handful of wikis (T410469)]], [[gerrit:1214571|[Growth] Sort the list of Add Link wikis alphabetically (T410469)]] [09:04:28] T410469: Add a Link: Rollout "Add a Link" task to remaining Wikipedias that have V2 model support but don't yet have access to "Add a Link" - https://phabricator.wikimedia.org/T410469 [09:06:26] !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1214570|[Growth] Enable Add Link backend on a handful of wikis (T410469)]], [[gerrit:1214571|[Growth] Sort the list of Add Link wikis alphabetically (T410469)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:07:57] (03PS1) 10Arnaudb: apt-staging: alert only on codfw [alerts] - 10https://gerrit.wikimedia.org/r/1216549 (https://phabricator.wikimedia.org/T409835) [09:07:57] (03CR) 10Arnaudb: "this should fix the "Linting problems" alert" [alerts] - 10https://gerrit.wikimedia.org/r/1216549 (https://phabricator.wikimedia.org/T409835) (owner: 10Arnaudb) [09:07:58] !log hashar@deploy2002 Started deploy [integration/docroot@41d63f3]: build: Updating eslint-config-wikimedia to 0.32.3 [09:08:10] !log hashar@deploy2002 Finished deploy [integration/docroot@41d63f3]: build: Updating eslint-config-wikimedia to 0.32.3 (duration: 00m 11s) [09:08:15] !log urbanecm@deploy2002 urbanecm: Continuing with sync [09:11:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [09:12:32] (03CR) 10Arnaudb: [C:03+2] apt-staging: alert only on codfw [alerts] - 10https://gerrit.wikimedia.org/r/1216549 (https://phabricator.wikimedia.org/T409835) (owner: 10Arnaudb) [09:13:42] (03Merged) 10jenkins-bot: apt-staging: alert only on codfw [alerts] - 10https://gerrit.wikimedia.org/r/1216549 (https://phabricator.wikimedia.org/T409835) (owner: 10Arnaudb) [09:14:26] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1214570|[Growth] Enable Add Link backend on a handful of wikis (T410469)]], [[gerrit:1214571|[Growth] Sort the list of Add Link wikis alphabetically (T410469)]] (duration: 10m 01s) [09:14:30] T410469: Add a Link: Rollout "Add a Link" task to remaining Wikipedias that have V2 model support but don't yet have access to "Add a Link" - https://phabricator.wikimedia.org/T410469 [09:25:00] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and Superset for Solenne_Lazare_WMDE - https://phabricator.wikimedia.org/T411977#11439417 (10Solenne_Lazare_WMDE) [09:25:05] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and Superset for Solenne_Lazare_WMDE - https://phabricator.wikimedia.org/T411977#11439418 (10Solenne_Lazare_WMDE) @Lena_WMDE [09:32:13] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and Superset for Solenne_Lazare_WMDE - https://phabricator.wikimedia.org/T411977#11439440 (10Lena_WMDE) I can confirm that @Solenne_Lazare_WMDE is head of product strategy at Wikimedia Deutschland. [09:33:06] 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and Superset for Solenne_Lazare_WMDE - https://phabricator.wikimedia.org/T411977#11439455 (10Lena_WMDE) a:05Lena_WMDE→03None [09:39:45] dpogorzelski@cumin1003 rename (PID 2618755) is awaiting input [09:43:42] (03CR) 10Arnaudb: [C:03+2] gerrit: rsync logic extraction from failover [cookbooks] - 10https://gerrit.wikimedia.org/r/1214466 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [09:46:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [09:48:04] !log restarting Blazegraph on wdqs1015 - allocator decreasing - https://grafana.wikimedia.org/goto/Jygg2zMvg?orgId=1 [09:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:39] cc: gmodena, ryankemper, inflatador ^^ [09:49:13] (03Merged) 10jenkins-bot: gerrit: rsync logic extraction from failover [cookbooks] - 10https://gerrit.wikimedia.org/r/1214466 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [09:50:04] (03PS26) 10Func: SiteConfiguration: Make sure the array is a list before appending [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) [09:50:26] (03CR) 10Jelto: [C:03+1] "looks reasonable until gerrit uses a certificate with `gerrit.discovery.wmnet`" [puppet] - 10https://gerrit.wikimedia.org/r/1215684 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [09:52:41] (03PS21) 10Func: [DNM] Test diffConfig for the parent change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 [09:53:11] (03CR) 10Jelto: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1215388 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [09:53:35] (03CR) 10Elukey: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [09:54:31] (03PS1) 10WMDE-Fisch: VE: Don't create a synth ref when there's a LDR main ref [extensions/Cite] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216553 (https://phabricator.wikimedia.org/T411245) [09:54:33] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1192528 (https://phabricator.wikimedia.org/T406023) (owner: 10TheDJ) [09:56:14] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/Cite] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216553 (https://phabricator.wikimedia.org/T411245) (owner: 10WMDE-Fisch) [09:56:21] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: 10A smart kitten) [09:56:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [09:58:15] (03CR) 10Thiemo Kreuz (WMDE): [C:03+1] VE: Don't create a synth ref when there's a LDR main ref [extensions/Cite] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216553 (https://phabricator.wikimedia.org/T411245) (owner: 10WMDE-Fisch) [09:58:49] (03CR) 10A smart kitten: "Thanks for the review! & thanks for the advice, I will keep an eye on things, and ask folks in the task to leave a comment if anything doe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: 10A smart kitten) [10:05:11] 06SRE-OnFire, 06collaboration-services, 10Znuny: ticket.wikimedia.org should page when down - https://phabricator.wikimedia.org/T354479#11439734 (10LSobanski) a:05LSobanski→03None [10:13:29] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207133 (https://phabricator.wikimedia.org/T407431) (owner: 10Cyndywikime) [10:16:11] (03PS10) 10Dpogorzelski: ml-build: define new machine name/type [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) [10:16:22] (03CR) 10Dpogorzelski: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [10:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:17:07] (03PS1) 10D3r1ck01: Pass an explicit performer when attempting CreateLocalAccount [extensions/CentralAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216558 (https://phabricator.wikimedia.org/T411826) [10:18:06] 06SRE: dpogorzelski gpg key - https://phabricator.wikimedia.org/T411993#11439785 (10taavi) [10:33:23] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [extensions/CentralAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216558 (https://phabricator.wikimedia.org/T411826) (owner: 10D3r1ck01) [10:37:33] FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:43:46] (03PS1) 10STran: Enable IRS v2 non-emergency workflow on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) [10:50:50] (03PS2) 10STran: Enable IRS v2 non-emergency workflow on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) [10:53:25] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86447 and previous config saved to /var/cache/conftool/dbconfig/20251208-105325-ladsgroup.json [10:53:29] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [10:55:13] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1100) [11:08:33] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P86448 and previous config saved to /var/cache/conftool/dbconfig/20251208-110832-ladsgroup.json [11:22:46] FIRING: Primary outbound port utilisation over 80% #page: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [11:23:01] hello [11:23:41] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P86449 and previous config saved to /var/cache/conftool/dbconfig/20251208-112340-ladsgroup.json [11:27:46] RESOLVED: Primary outbound port utilisation over 80% #page: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [11:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:31:54] (03CR) 10Lucas Werkmeister (WMDE): "Product “would prefer to take a little time to catch up and check everything before we send it out”, so this isn’t happening today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015) (owner: 10Arthur taylor) [11:32:20] (03CR) 10Mszwarc: Enable IRS v2 non-emergency workflow on beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) (owner: 10STran) [11:37:46] FIRING: Primary outbound port utilisation over 80% #page: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [11:38:48] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86450 and previous config saved to /var/cache/conftool/dbconfig/20251208-113848-ladsgroup.json [11:38:52] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [11:39:04] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance [11:39:12] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2171 (T410589)', diff saved to https://phabricator.wikimedia.org/P86451 and previous config saved to /var/cache/conftool/dbconfig/20251208-113911-ladsgroup.json [11:42:46] RESOLVED: Primary outbound port utilisation over 80% #page: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [11:42:46] FIRING: Primary inbound port utilisation over 80% #page: Alert for device cr1-esams.wikimedia.org - Primary inbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page [11:47:06] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [11:47:46] RESOLVED: Primary inbound port utilisation over 80% #page: Device cr1-esams.wikimedia.org recovered from Primary inbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page [11:51:10] (03PS2) 10Bartosz Wójtowicz: ml-services: Deploy experimental CPU-only revise-tone-task-generator. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215112 (https://phabricator.wikimedia.org/T411758) [11:52:46] FIRING: Primary inbound port utilisation over 80% #page: Alert for device cr1-esams.wikimedia.org - Primary inbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page [11:53:44] (03CR) 10A smart kitten: [C:03+1] "I manually rebased this patch so I'm listed as its uploader, but this LGTM. Should just be a no-op to manually set this config variable's " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1192528 (https://phabricator.wikimedia.org/T406023) (owner: 10TheDJ) [11:57:46] RESOLVED: Primary inbound port utilisation over 80% #page: Device cr1-esams.wikimedia.org recovered from Primary inbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page [12:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [12:19:29] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1216573 (owner: 10L10n-bot) [12:26:16] (03CR) 10A smart kitten: "Thanks for working on this!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) (owner: 10Func) [12:32:17] (03CR) 10Klausman: [C:03+1] ml-services: Deploy experimental CPU-only revise-tone-task-generator. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215112 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [12:32:45] FIRING: Primary outbound port utilisation over 80% #page: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:34:57] (03CR) 10Majavah: [C:03+2] openstack: puppet: Do not commit empty role fiels [puppet] - 10https://gerrit.wikimedia.org/r/1214491 (owner: 10Majavah) [12:37:46] RESOLVED: Primary outbound port utilisation over 80% #page: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page [12:40:25] the alert said "80%" but the threshold is at 90%, so I updated the alerting text to match reality [12:50:28] (03CR) 10Bartosz Wójtowicz: [C:03+2] ml-services: Deploy experimental CPU-only revise-tone-task-generator. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215112 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [12:51:35] !incidents [12:51:36] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:51:36] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [12:51:36] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [12:51:36] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:51:36] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:51:37] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [12:51:37] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [12:52:31] (03Merged) 10jenkins-bot: ml-services: Deploy experimental CPU-only revise-tone-task-generator. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215112 (https://phabricator.wikimedia.org/T411758) (owner: 10Bartosz Wójtowicz) [12:52:46] FIRING: Primary inbound port utilisation over 90% #page: Alert for device cr1-esams.wikimedia.org - Primary inbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+90%25++%23page [12:53:49] !log bwojtowicz@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . [12:55:03] !incidents [12:55:03] 7106 (UNACKED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [12:55:03] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:55:03] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [12:55:03] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [12:55:04] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:55:04] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [12:55:04] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [12:55:04] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [12:57:46] RESOLVED: Primary inbound port utilisation over 90% #page: Device cr1-esams.wikimedia.org recovered from Primary inbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+90%25++%23page [13:18:51] FIRING: TransitPeeringTransportOutboundSaturation: Transit, peering or transport outbound traffic above 90% capacity - cr2-eqiad:xe-3/2/1 (Transport: cr1-esams:xe-0/0/7 (Colt, ... [13:18:51] 445419311 80ms 10Gbps wave) {#2013}) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#Primary_outbound_port_utilization_over_90% - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DTransitPeeringTransportOutboundSaturation [13:19:36] same link but this time it's the new Alert Manager alert (with more info) [13:19:47] the other alert is the old LibreNMS alert [13:20:02] I'll disable the LibreNMS one once we trust AM enough [13:21:18] (03PS4) 10Kgraessle: Enable revertrisk filters in thwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207923 (https://phabricator.wikimedia.org/T409438) [13:21:36] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207923 (https://phabricator.wikimedia.org/T409438) (owner: 10Kgraessle) [13:22:45] FIRING: Primary outbound port utilisation over 90% #page: Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+90%25++%23page [13:22:58] !incidents [13:22:58] 7107 (UNACKED) TransitPeeringTransportOutboundSaturation network sre (cr2-eqiad:9804 Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013} xe-3/2/1 gnmi eqiad) [13:22:59] 7108 (UNACKED) Primary outbound port utilisation over 90% (paged) network noc (cr2-eqiad.wikimedia.org) [13:22:59] 7106 (RESOLVED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:22:59] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:22:59] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:22:59] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:23:00] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:00] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:00] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:23:01] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:23:07] !ack 7108 [13:23:08] 7108 (ACKED) Primary outbound port utilisation over 90% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:16] !ack 7107 [13:23:16] 7107 (ACKED) TransitPeeringTransportOutboundSaturation network sre (cr2-eqiad:9804 Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013} xe-3/2/1 gnmi eqiad) [13:23:21] !incidents [13:23:21] 7107 (ACKED) TransitPeeringTransportOutboundSaturation network sre (cr2-eqiad:9804 Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013} xe-3/2/1 gnmi eqiad) [13:23:22] 7108 (ACKED) Primary outbound port utilisation over 90% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:22] 7106 (RESOLVED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:23:22] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:22] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:23:22] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:23:23] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:23] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:23:23] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:23:24] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:23:53] (03PS1) 10Arnaudb: gerrit: disable backups temporarily [puppet] - 10https://gerrit.wikimedia.org/r/1216583 (https://phabricator.wikimedia.org/T387833) [13:28:46] FIRING: Primary inbound port utilisation over 90% #page: Alert for device cr1-esams.wikimedia.org - Primary inbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+90%25++%23page [13:28:51] RESOLVED: TransitPeeringTransportOutboundSaturation: Transit, peering or transport outbound traffic above 90% capacity - cr2-eqiad:xe-3/2/1 (Transport: cr1-esams:xe-0/0/7 (Colt, ... [13:28:51] 445419311 80ms 10Gbps wave) {#2013}) #page - https://wikitech.wikimedia.org/wiki/Network_monitoring#Primary_outbound_port_utilization_over_90% - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DTransitPeeringTransportOutboundSaturation [13:34:15] !incidents [13:34:15] 7108 (ACKED) Primary outbound port utilisation over 90% (paged) network noc (cr2-eqiad.wikimedia.org) [13:34:15] 7109 (UNACKED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:34:16] 7107 (RESOLVED) TransitPeeringTransportOutboundSaturation network sre (cr2-eqiad:9804 Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013} xe-3/2/1 gnmi eqiad) [13:34:16] 7106 (RESOLVED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:34:16] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:34:16] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:34:16] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:34:17] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:34:17] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:34:17] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:34:18] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:34:28] !ack 7109 [13:34:28] 7109 (ACKED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:36:02] 06SRE, 06Infrastructure-Foundations, 10netops: Nokia SR-Linux ARP resolution bug on v24.10.x+ - https://phabricator.wikimedia.org/T409178#11440290 (10cmooney) Nokia have come back to say they were able to reproduce the issue, and confirm the cause as well as the fact it is not a problem in the latest SR-Linu... [13:37:46] RESOLVED: Primary outbound port utilisation over 90% #page: Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+90%25++%23page [13:37:50] (03CR) 10Jforrester: [C:03+1] "Looks right to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) (owner: 10Func) [13:38:45] RESOLVED: Primary inbound port utilisation over 90% #page: Device cr1-esams.wikimedia.org recovered from Primary inbound port utilisation over 90% #page - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+90%25++%23page [13:45:19] !incidents [13:45:19] 7109 (RESOLVED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:45:20] 7108 (RESOLVED) Primary outbound port utilisation over 90% (paged) network noc (cr2-eqiad.wikimedia.org) [13:45:20] 7107 (RESOLVED) TransitPeeringTransportOutboundSaturation network sre (cr2-eqiad:9804 Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013} xe-3/2/1 gnmi eqiad) [13:45:20] 7106 (RESOLVED) Primary inbound port utilisation over 90% (paged) network noc (cr1-esams.wikimedia.org) [13:45:20] 7105 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:45:20] 7104 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:45:21] 7103 (RESOLVED) Primary inbound port utilisation over 80% (paged) network noc (cr1-esams.wikimedia.org) [13:45:21] 7102 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:45:21] 7101 (RESOLVED) Primary outbound port utilisation over 80% (paged) network noc (cr2-eqiad.wikimedia.org) [13:45:22] 7100 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:45:22] 7099 (RESOLVED) ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet codfw) [13:46:13] (03PS27) 10Func: Namespaces: Maintain the status quo after changes to SiteConfiguration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) [13:53:35] (03CR) 10Func: "Yeah, I agree that seeing the previous commit message may not expect config changes. I updated the commit message to emphasise more on the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) (owner: 10Func) [14:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1400) [14:00:05] hubaishan, Kizule, anzx, A_smart_kitten, WMDE-Fisch, Cyndywikime, and xSavitar: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:08] o/ [14:00:15] Finally here! \o [14:00:25] here, though as mentioned in #wikimedia-tech i can reschedule mine if it'd be a problem :) [14:01:24] 👋 [14:02:45] o/ [14:07:38] (03PS22) 10Func: [DNM] Test diffConfig for the parent change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 [14:09:45] (03CR) 10Func: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216181 (owner: 10Func) [14:11:57] I'm here [14:14:34] (03CR) 10Klausman: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [14:15:34] Anybody here? :D [14:15:41] (03PS1) 10Btullis: Bump the size of the PostgreSQL WAL for airflow-main to 30GB [deployment-charts] - 10https://gerrit.wikimedia.org/r/1216589 (https://phabricator.wikimedia.org/T375846) [14:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:17:29] Seems like deployers are not available today. [14:18:27] Lucas_WMDE was in #wikimedia-tech ~20min before the window, though I guess he might be busy at the moment [14:19:38] (03CR) 10Btullis: [C:03+2] Bump the size of the PostgreSQL WAL for airflow-main to 30GB [deployment-charts] - 10https://gerrit.wikimedia.org/r/1216589 (https://phabricator.wikimedia.org/T375846) (owner: 10Btullis) [14:20:45] I could help in deploying the backports on wmf.5 patches. I'm not sure if can deploy all the config patches myself. [14:21:13] For my deployment + running namespaceDupes is required. [14:21:35] (03Merged) 10jenkins-bot: Bump the size of the PostgreSQL WAL for airflow-main to 30GB [deployment-charts] - 10https://gerrit.wikimedia.org/r/1216589 (https://phabricator.wikimedia.org/T375846) (owner: 10Btullis) [14:21:59] Kizule, ack! [14:22:51] WMDE-Fisch, are you around to test your backport? I can do yours and mine together [14:23:16] xSavitar: IMO my FlaggedRevs patch should probably be deployed by itself given the somewhat unpredictable state of FlaggedRevs. My SVG path is a no-op and IMO can be deployed with other low-risk changes. I am happy to reschedule either for another time though :) [14:23:38] A_smart_kitten, okay! [14:25:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by derick@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216558 (https://phabricator.wikimedia.org/T411826) (owner: 10D3r1ck01) [14:25:20] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [14:25:26] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [14:27:29] (03Merged) 10jenkins-bot: Pass an explicit performer when attempting CreateLocalAccount [extensions/CentralAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216558 (https://phabricator.wikimedia.org/T411826) (owner: 10D3r1ck01) [14:27:50] !log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1216558|Pass an explicit performer when attempting CreateLocalAccount (T411826 T411952)]] [14:27:55] T411826: MediaWiki periodic job centralauth-backfilllocalaccounts.php-loginwiki failed - https://phabricator.wikimedia.org/T411826 [14:27:55] T411952: Special:CreateLocalAccount doesn't create accounts for other users due to IP blocks, even when I am a sysop - https://phabricator.wikimedia.org/T411952 [14:29:36] (03PS1) 10Arnaudb: gerrit: add a confirmation prompt on rsync [cookbooks] - 10https://gerrit.wikimedia.org/r/1216592 (https://phabricator.wikimedia.org/T387833) [14:29:37] (03CR) 10Arnaudb: "example prompt: https://phabricator.wikimedia.org/P86453#L21" [cookbooks] - 10https://gerrit.wikimedia.org/r/1216592 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb) [14:29:42] (03CR) 10Dpogorzelski: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [14:29:47] !log derick@deploy2002 d3r1ck01, derick: Backport for [[gerrit:1216558|Pass an explicit performer when attempting CreateLocalAccount (T411826 T411952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:30:41] !log derick@deploy2002 d3r1ck01, derick: Continuing with sync [14:34:54] !log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216558|Pass an explicit performer when attempting CreateLocalAccount (T411826 T411952)]] (duration: 07m 04s) [14:34:59] T411826: MediaWiki periodic job centralauth-backfilllocalaccounts.php-loginwiki failed - https://phabricator.wikimedia.org/T411826 [14:34:59] T411952: Special:CreateLocalAccount doesn't create accounts for other users due to IP blocks, even when I am a sysop - https://phabricator.wikimedia.org/T411952 [14:37:33] FIRING: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:39:43] Kizule, I can try to deploy yours now [14:39:47] Are you still around? [14:40:05] xSavitar: Yup [14:40:11] Okay! [14:41:34] I am here also [14:41:42] (03CR) 10TrainBranchBot: [C:03+2] "Approved by derick@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750) (owner: 10Zoranzoki21) [14:42:12] (03PS1) 10Ayounsi: [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) [14:43:32] (03CR) 10CI reject: [V:04-1] [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) (owner: 10Ayounsi) [14:44:45] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11440542 (10Jclark-ctr) a:03Jclark-ctr [14:44:55] (03Merged) 10jenkins-bot: Add Serbian Latin draft namespace and talk namespace aliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750) (owner: 10Zoranzoki21) [14:45:13] !log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1215060|Add Serbian Latin draft namespace and talk namespace aliases (T411750)]] [14:45:17] T411750: Add missing Serbian Latin alias for Draft namespace - https://phabricator.wikimedia.org/T411750 [14:46:01] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11440549 (10Jclark-ctr) @RKemper This server is out of warranty. I believe we might have a spare battery in stock. Is there any chance we can schedule downtime to open it up a... [14:47:14] !log derick@deploy2002 derick, zoranzoki21: Backport for [[gerrit:1215060|Add Serbian Latin draft namespace and talk namespace aliases (T411750)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:47:28] Kizule, you can test [14:47:44] xSavitar: Let's see [14:48:36] xSavitar: Looks good. Please run namespaceDupes after it's finally deployed. [14:48:43] Ack! Will do [14:48:55] Should I sync? [14:49:03] Yeah [14:49:06] can you do patch of arwiktionary [14:49:19] !log derick@deploy2002 derick, zoranzoki21: Continuing with sync [14:49:31] hubaishan, yes! I can :) [14:51:45] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Degraded RAID on an-worker1191 - https://phabricator.wikimedia.org/T411209#11440578 (10Jclark-ctr) a:05Jclark-ctr→03BTullis [14:52:11] hubaishan, your patch does more than just adding namespaces :) [14:52:54] It is deafaults for wiktioanries [14:53:21] !log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215060|Add Serbian Latin draft namespace and talk namespace aliases (T411750)]] (duration: 08m 08s) [14:53:25] T411750: Add missing Serbian Latin alias for Draft namespace - https://phabricator.wikimedia.org/T411750 [14:53:31] these namespaces are content namespace [14:53:42] Seems like the change will trigger reindexing of pages in CirrusSearch? [14:54:44] Kizule: 📜 Streaming logs: [14:54:44] 0 pages to fix, 0 were resolvable. [14:54:44] 0 links to fix, 0 were resolvable, 0 were deleted. [14:54:44] Looks good! [14:54:53] All looks good after running the script [14:55:13] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:55:29] hubaishan, can I please ask you to wait for a more experienced deployer than me to deploy your patch? [14:55:40] * urbanecm is around if needed [14:55:52] urbanecm, okay! I can hand over to you. [14:55:58] xSavitar: Looks good [14:56:06] I've deployed mine and Kizule's [14:56:12] Kizule, thanks for confirming. [14:56:23] urbanecm, over to you sir 🙏🏽 [14:56:31] ack, thank you [14:56:34] what i said above re mine still stands; happy to reschedule them if that would help :) [14:56:40] most of the patches are remaining [14:56:44] urbanecm, apart from WMDE-Fisch, the rest are all config patches [14:57:39] (03PS4) 10Anzx: niawiktionary: update wordmark, sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) [14:57:42] (03CR) 10Urbanecm: [C:03+2] niawiktionary: update wordmark, sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) (owner: 10Anzx) [14:58:28] (03PS4) 10Anzx: shnwiki: add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) [14:58:31] (03CR) 10Urbanecm: [C:03+2] shnwiki: add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [14:58:52] (03PS3) 10D3r1ck01: [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan) [14:58:58] (03CR) 10D3r1ck01: [C:03+1] [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan) [14:59:11] (03PS2) 10Ayounsi: [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) [14:59:12] (03Merged) 10jenkins-bot: niawiktionary: update wordmark, sitename and projectnamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216196 (https://phabricator.wikimedia.org/T411850) (owner: 10Anzx) [14:59:26] (03PS3) 10Cyndywikime: [Growth]:Remove GELevelingUpNewNotificationsEnabled config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207133 (https://phabricator.wikimedia.org/T407431) [14:59:29] (03CR) 10Urbanecm: [C:03+2] [Growth]:Remove GELevelingUpNewNotificationsEnabled config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207133 (https://phabricator.wikimedia.org/T407431) (owner: 10Cyndywikime) [15:00:15] (03CR) 10Ssingh: [C:03+1] "Let's try it out on a site-wide reboot. Please merge at will." [cookbooks] - 10https://gerrit.wikimedia.org/r/1213549 (owner: 10CDobbins) [15:00:30] (03CR) 10CI reject: [V:04-1] [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) (owner: 10Ayounsi) [15:00:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [15:00:35] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207133 (https://phabricator.wikimedia.org/T407431) (owner: 10Cyndywikime) [15:00:52] (03CR) 10CI reject: [V:04-1] shnwiki: add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [15:01:26] (03CR) 10Urbanecm: [C:03+2] "..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [15:01:38] (03Merged) 10jenkins-bot: [Growth]:Remove GELevelingUpNewNotificationsEnabled config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207133 (https://phabricator.wikimedia.org/T407431) (owner: 10Cyndywikime) [15:02:30] (03PS3) 10Ayounsi: [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) [15:02:41] (03Merged) 10jenkins-bot: shnwiki: add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216226 (https://phabricator.wikimedia.org/T411965) (owner: 10Anzx) [15:03:58] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1216196|niawiktionary: update wordmark, sitename and projectnamespace (T411850)]], [[gerrit:1216226|shnwiki: add draft namespace (T411965)]], [[gerrit:1207133|[Growth]:Remove GELevelingUpNewNotificationsEnabled config (T407431)]] [15:04:05] T411850: nia.wiktionary - Rename namespace Wiktionary into Wikikamus - https://phabricator.wikimedia.org/T411850 [15:04:05] T411965: Add Draft: Namespace for shnwikipedia - https://phabricator.wikimedia.org/T411965 [15:04:06] T407431: Growth's "48 hour" newcomer notifications: end A/B test experiment & release changes - https://phabricator.wikimedia.org/T407431 [15:04:45] (03CR) 10CI reject: [V:04-1] [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) (owner: 10Ayounsi) [15:05:24] A_smart_kitten: it seems like https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1192528 is a no-op, right? [15:05:30] that the current default _is_ false [15:05:52] urbanecm: yep! IMO it can be deployed with other low-risk patches. [15:06:03] sounds good [15:06:07] !log urbanecm@deploy2002 cyndywikime, urbanecm, anzx: Backport for [[gerrit:1216196|niawiktionary: update wordmark, sitename and projectnamespace (T411850)]], [[gerrit:1216226|shnwiki: add draft namespace (T411965)]], [[gerrit:1207133|[Growth]:Remove GELevelingUpNewNotificationsEnabled config (T407431)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:06:39] imo the FlaggedRevs one should maybe be deployed by itself given the somewhat unpredictable state of FlaggedRevs. i'm happy to delay it to another window though :) [15:06:40] checking [15:06:43] Cyndywikime: anzx: please test [15:06:53] A_smart_kitten: yeah, especially given it seems to contain a workaround [15:06:59] depends on whether you have time, i can stretch the window a little bit [15:07:04] testing [15:07:18] WMDE-Fisch: also, same for you! should i deploy your patch too? [15:07:25] urbanecm: whatever works. i am here right now, but i can also be here later or tomorrow :p [15:07:49] (so if you want to stretch the window that's fine, i have time :)) [15:07:55] happy to deploy once the current batch finishes :) [15:08:02] ack :D [15:08:10] urbanecm: looks good, ok to sync [15:08:15] ty, waiting on Cyndywikime [15:08:40] looks good as well [15:08:43] perf [15:08:44] !log urbanecm@deploy2002 cyndywikime, urbanecm, anzx: Continuing with sync [15:09:00] (03CR) 10Urbanecm: [C:03+2] enwikibooks: Limit FlaggedRevs to specific namespaces; disable FR stable-transclusion-checking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: 10A smart kitten) [15:09:46] o/ [15:09:49] (03CR) 10Urbanecm: [C:03+2] "should be a no-op" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1192528 (https://phabricator.wikimedia.org/T406023) (owner: 10TheDJ) [15:09:53] I was busy having lunch, sorry :P [15:09:56] (03Merged) 10jenkins-bot: enwikibooks: Limit FlaggedRevs to specific namespaces; disable FR stable-transclusion-checking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: 10A smart kitten) [15:09:57] no worries Lucas_WMDE [15:10:01] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:10:03] thanks for deploying! [15:10:13] Lucas_WMDE: how dare you eat lunch! /j [15:10:45] (03Merged) 10jenkins-bot: SVG: do not allow native SVG rendering [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1192528 (https://phabricator.wikimedia.org/T406023) (owner: 10TheDJ) [15:10:46] and no worries, looks like things are gonna get deployed okay :] [15:10:58] \o/ [15:11:19] (03PS4) 10Ayounsi: [WIP] Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) [15:12:42] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [15:12:47] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216196|niawiktionary: update wordmark, sitename and projectnamespace (T411850)]], [[gerrit:1216226|shnwiki: add draft namespace (T411965)]], [[gerrit:1207133|[Growth]:Remove GELevelingUpNewNotificationsEnabled config (T407431)]] (duration: 08m 49s) [15:12:49] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [15:12:53] T411850: nia.wiktionary - Rename namespace Wiktionary into Wikikamus - https://phabricator.wikimedia.org/T411850 [15:12:54] T411965: Add Draft: Namespace for shnwikipedia - https://phabricator.wikimedia.org/T411965 [15:12:54] T407431: Growth's "48 hour" newcomer notifications: end A/B test experiment & release changes - https://phabricator.wikimedia.org/T407431 [15:13:25] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1192528|SVG: do not allow native SVG rendering (T406023)]], [[gerrit:1201051|enwikibooks: Limit FlaggedRevs to specific namespaces; disable FR stable-transclusion-checking (T408110 T410330)]] [15:13:32] T406023: Enable SVG native rendering for 3rd parties by default - https://phabricator.wikimedia.org/T406023 [15:13:32] T408110: Limit Flagged Revisions to the article, Cookbook, and Wikijunior namespaces on English Wikibooks - https://phabricator.wikimedia.org/T408110 [15:13:32] T410330: Modify $wgFlaggedRevsHandleIncludes for English Wikibooks - https://phabricator.wikimedia.org/T410330 [15:14:46] urbanecm: please run namespacedupes for shnwiki and niawiktionary [15:14:52] in the queue :) [15:15:25] !log urbanecm@deploy2002 urbanecm, asmartkitten, hartman: Backport for [[gerrit:1192528|SVG: do not allow native SVG rendering (T406023)]], [[gerrit:1201051|enwikibooks: Limit FlaggedRevs to specific namespaces; disable FR stable-transclusion-checking (T408110 T410330)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:15:33] testing; might be a minute or two as i want to make sure as best i can that i haven't broken FR on enwikibooks [15:15:36] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [15:15:43] !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply [15:15:45] A_smart_kitten: sounds good to me [15:17:53] urbanecm: both look good as best i can see :] [15:17:57] perfect [15:18:00] !log urbanecm@deploy2002 urbanecm, asmartkitten, hartman: Continuing with sync [15:18:23] hubaishan: still around? :) [15:18:31] yes [15:18:45] (03PS4) 10D3r1ck01: [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan) [15:18:48] (03CR) 10Urbanecm: [C:03+2] [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan) [15:19:57] (03Merged) 10jenkins-bot: [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan) [15:22:06] for the record, i just found another issue with removing namespaces from FlaggedRevs (the count on Special:PendingChanges doesn't look like it's decreased), but IMO we do not need to revert as it seems relatively minor. [15:22:13] sounds good [15:22:14] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1192528|SVG: do not allow native SVG rendering (T406023)]], [[gerrit:1201051|enwikibooks: Limit FlaggedRevs to specific namespaces; disable FR stable-transclusion-checking (T408110 T410330)]] (duration: 08m 49s) [15:22:14] i will flag it in the task to the community [15:22:17] A_smart_kitten: can you note it on the task though? [15:22:20] T406023: Enable SVG native rendering for 3rd parties by default - https://phabricator.wikimedia.org/T406023 [15:22:21] T408110: Limit Flagged Revisions to the article, Cookbook, and Wikijunior namespaces on English Wikibooks - https://phabricator.wikimedia.org/T408110 [15:22:21] T410330: Modify $wgFlaggedRevsHandleIncludes for English Wikibooks - https://phabricator.wikimedia.org/T410330 [15:22:22] ah, you already thought of that. perfect. [15:23:43] FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [15:24:17] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1215280|[config] arwiktionary: add 2 namespaces with talks (T411819)]] [15:24:20] T411819: [config] arwiktionary: add 2 namespaces - https://phabricator.wikimedia.org/T411819 [15:25:56] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=shnwiki [15:26:19] shnwiki output https://www.irccloud.com/pastebin/pXEwBe7y/ [15:26:21] !log urbanecm@deploy2002 hubaishan, urbanecm: Backport for [[gerrit:1215280|[config] arwiktionary: add 2 namespaces with talks (T411819)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [15:26:29] hubaishan: can you double check your patch, please? [15:27:01] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=shnwiki --fix # T411965 [15:27:02] FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [15:27:04] T411965: Add Draft: Namespace for shnwikipedia - https://phabricator.wikimedia.org/T411965 [15:28:10] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=niawiktionary # T411850 [15:28:13] T411850: nia.wiktionary - Rename namespace Wiktionary into Wikikamus - https://phabricator.wikimedia.org/T411850 [15:28:32] nothing to do on niawiktionary https://www.irccloud.com/pastebin/P7mH03bV/ [15:28:38] anzx: ^^^ [15:28:45] hubaishan: how is the testing going, please? [15:28:58] the namesspaces appeared [15:29:07] urbanecm: Thanks for deploying and script run [15:29:14] np [15:30:05] Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1530) [15:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:30:56] !log urbanecm@deploy2002 hubaishan, urbanecm: Continuing with sync [15:34:59] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215280|[config] arwiktionary: add 2 namespaces with talks (T411819)]] (duration: 10m 42s) [15:35:02] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:35:03] T411819: [config] arwiktionary: add 2 namespaces - https://phabricator.wikimedia.org/T411819 [15:36:35] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=arwiktionary # T411819 [15:37:18] RESOLVED: [2x] ProbeDown: Service wdqs1011:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1011:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [15:37:45] (03PS1) 10DDesouza: Partially undeploy 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) [15:38:38] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=arwiktionary --fix # T411819 [15:39:02] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) (owner: 10DDesouza) [15:39:21] !log urbanecm@deploy2002 mwscript-k8s job started: namespaceDupes.php --wiki=arwiktionary # T411819 [15:39:22] (03CR) 10LSobanski: "Approved in the IF team meeting." [puppet] - 10https://gerrit.wikimedia.org/r/1215156 (https://phabricator.wikimedia.org/T406593) (owner: 10Btullis) [15:39:28] (03CR) 10LSobanski: "Approved in the IF team meeting." [puppet] - 10https://gerrit.wikimedia.org/r/1215157 (https://phabricator.wikimedia.org/T411774) (owner: 10Muehlenhoff) [15:39:38] (03CR) 10Dpogorzelski: ml-build: define new machine name/type (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778) (owner: 10Dpogorzelski) [15:39:57] hubaishan: should be completed [15:39:59] anything else, anyone? [15:40:10] thank you [15:40:19] All is OK [15:40:21] 06SRE, 06collaboration-services, 06Infrastructure-Foundations: gitlab2002: wrong network for public IPV4 and IPV6 - https://phabricator.wikimedia.org/T370018#11440756 (10Jelto) `gitlab2002` is no longer the active host. So maintenance on `gitlab2002` should be possible now. [15:41:04] thanks for the deploys urbanecm! [15:41:19] (03CR) 10Btullis: [C:03+2] Add a growthbook system user and grant it access to private data [puppet] - 10https://gerrit.wikimedia.org/r/1215156 (https://phabricator.wikimedia.org/T406593) (owner: 10Btullis) [15:43:36] any time! [15:47:06] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [15:49:13] RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [15:51:26] PROBLEM - Host ml-lab1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:59:43] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Q2:rack/setup/install wdqs1033-1035 - https://phabricator.wikimedia.org/T411731#11440821 (10bking) [16:05:30] (03PS5) 10Ayounsi: Nokia: ensure disabled ports speed is set correctly [homer/public] - 10https://gerrit.wikimedia.org/r/1216595 (https://phabricator.wikimedia.org/T409178) [16:06:17] (03PS3) 10STran: Enable IRS v2 non-emergency workflow on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) [16:09:32] RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [16:13:27] (03PS1) 10Bking: wdqs: Add new hosts [puppet] - 10https://gerrit.wikimedia.org/r/1216607 (https://phabricator.wikimedia.org/T411731) [16:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [16:18:03] (03CR) 10STran: Enable IRS v2 non-emergency workflow on beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) (owner: 10STran) [16:18:36] (03PS1) 10DLynch: Add instrumentation for mobile section switching [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216608 (https://phabricator.wikimedia.org/T410319) [16:19:52] (03PS1) 10DLynch: Edit full page: Tweak skeleton appearance and fix scroll offsets [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216609 [16:20:03] (03PS1) 10DLynch: Add i18n for edit full page button [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216610 [16:20:21] (03PS1) 10DLynch: Set full page scroll to 130px [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216611 (https://phabricator.wikimedia.org/T411669) [16:20:28] (03PS1) 10DLynch: Ensure images are fixed size on mobile while loading [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216612 (https://phabricator.wikimedia.org/T411669) [16:20:49] (03PS1) 10DLynch: Add experiment + tracking for mobile section switching [extensions/WikimediaEvents] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216613 (https://phabricator.wikimedia.org/T410803) [16:21:13] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216608 (https://phabricator.wikimedia.org/T410319) (owner: 10DLynch) [16:21:23] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216609 (owner: 10DLynch) [16:21:38] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216610 (owner: 10DLynch) [16:21:45] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216611 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [16:21:52] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216612 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [16:26:17] (03CR) 10Mszwarc: [C:03+1] Enable IRS v2 non-emergency workflow on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216561 (https://phabricator.wikimedia.org/T410512) (owner: 10STran) [16:30:05] jan_drewniak: Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1630). Please do the needful. [16:41:03] Hi! I have a backport that's going out today during the late afternoon backport window that requires a maintenance script job to be run after deployment. When is a good time to run the maintenance script job? The deployment window has quite a few tasks so I don't want to hold anyone up. [16:43:27] katherine_g: does the maintenance script take a long time to run? [16:44:57] Lucas_WMDE: unknown, but I do have an open question out to the folks that ran it last to see. [16:45:34] and is it a problem if there’s a delay between the config change and the maintenance script? [16:46:16] if it doesn’t take long then I think it would be okay to run the maintenance script “in parallel” with one of the other deployments, e.g. while waiting for one of them to merge in CI [16:46:31] (or, if you’re confident enough the script won’t cause problems, actually at the same time as a deployment ^^) [16:46:52] if it takes longer then I think it might make more sense to do the config change last in the window and then start the script afterwards [16:47:13] though there isn’t actually a break after this window so I guess you’d want to coordinate that with the Weekly Security deployment window folks (R.eedy et al.) [16:48:23] Lucas_WMDE: there's no problem if there's a delay between the config change + maintenance script [16:49:17] 06SRE, 06Infrastructure-Foundations, 10Wikimedia-Mailing-lists: lists.wikimedia.org subscription email rejected by DKIM - https://phabricator.wikimedia.org/T409137#11441086 (10LSobanski) [16:49:54] Lucas_WMDE: so I can wait until a good time to run it, but I'm not sure when that would be today [16:52:52] apparently eight years ago the script took just a few minutes on etwiki https://sal.toolforge.org/production?p=0&q=%22PopulateDatabase.php%22&d=2017-03-21 [16:53:38] and judging by https://et.wikipedia.org/wiki/Eri:Arvandmestik?uselang=en and https://th.wikipedia.org/wiki/%E0%B8%9E%E0%B8%B4%E0%B9%80%E0%B8%A8%E0%B8%A9:%E0%B8%AA%E0%B8%96%E0%B8%B4%E0%B8%95%E0%B8%B4?uselang=en, etwiki and thwiki are *very roughly* in the same ballpark today [16:54:05] (though both will have grown since 2017) [16:54:59] I think I’d deploy your config change first in the window and then let the maintenance script run in the background while the rest of the window is deployed, and expect it to finish before the window is over [16:55:08] but up to whoever ends up deploying it :) [16:56:02] Lucas_WMDE: sounds good, ty! [17:04:32] (03PS1) 10Urbanecm: Move mustache templates from includes [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216620 (https://phabricator.wikimedia.org/T409057) [17:04:34] (03PS1) 10Urbanecm: Adjust styling of confirmation emails [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216621 (https://phabricator.wikimedia.org/T411526) [17:08:24] (03CR) 10CI reject: [V:04-1] Adjust styling of confirmation emails [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216621 (https://phabricator.wikimedia.org/T411526) (owner: 10Urbanecm) [17:08:36] (03CR) 10Urbanecm: "recheck" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216621 (https://phabricator.wikimedia.org/T411526) (owner: 10Urbanecm) [17:24:52] jouncebot: nowandnext [17:24:52] No deployments scheduled for the next 0 hour(s) and 35 minute(s) [17:24:52] In 0 hour(s) and 35 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1800) [17:24:52] In 0 hour(s) and 35 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1800) [17:25:06] (03CR) 10Urbanecm: [C:03+2] "backporting, to be able to backport If4b941d6aaa579e99c832a3c25721e793bd05f84" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216620 (https://phabricator.wikimedia.org/T409057) (owner: 10Urbanecm) [17:25:12] (03CR) 10Urbanecm: [C:03+2] Adjust styling of confirmation emails [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216621 (https://phabricator.wikimedia.org/T411526) (owner: 10Urbanecm) [17:29:34] (03Merged) 10jenkins-bot: Move mustache templates from includes [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216620 (https://phabricator.wikimedia.org/T409057) (owner: 10Urbanecm) [17:29:39] (03Merged) 10jenkins-bot: Adjust styling of confirmation emails [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216621 (https://phabricator.wikimedia.org/T411526) (owner: 10Urbanecm) [17:31:37] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1216620|Move mustache templates from includes (T409057)]], [[gerrit:1216621|Adjust styling of confirmation emails (T411526)]] [17:31:43] T409057: Move mustache templates out of includes - https://phabricator.wikimedia.org/T409057 [17:31:43] T411526: Improve CSS styling for verification email - https://phabricator.wikimedia.org/T411526 [17:33:37] !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1216620|Move mustache templates from includes (T409057)]], [[gerrit:1216621|Adjust styling of confirmation emails (T411526)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [17:36:54] !log urbanecm@deploy2002 urbanecm: Continuing with sync [17:42:04] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216620|Move mustache templates from includes (T409057)]], [[gerrit:1216621|Adjust styling of confirmation emails (T411526)]] (duration: 10m 28s) [17:42:09] T409057: Move mustache templates out of includes - https://phabricator.wikimedia.org/T409057 [17:42:10] T411526: Improve CSS styling for verification email - https://phabricator.wikimedia.org/T411526 [17:55:22] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/Cite] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216553 (https://phabricator.wikimedia.org/T411245) (owner: 10WMDE-Fisch) [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1800) [18:00:05] ryankemper: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T1800). [18:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:22:01] 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker refresh - https://phabricator.wikimedia.org/T408760#11441361 (10Jclark-ctr) a:05Jhancock.wm→03None [18:41:43] FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs2022:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [18:51:43] RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs2022:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly [18:55:13] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:27:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [19:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [19:32:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [19:36:53] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201338 (https://phabricator.wikimedia.org/T290778) (owner: 10DLynch) [19:47:06] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [20:00:00] 06SRE, 06collaboration-services, 13Patch-For-Review, 05PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11441688 (10ATitkov) Hi @Dzahn > before we can deploy to production we would like to move this repo. But that would be within Gitlab rather... [20:05:40] 10SRE-SLO: Sloth: onboard subset of existing SLOs to pilot - https://phabricator.wikimedia.org/T409310#11441700 (10herron) Updated the wikifunctions slot pilot SLO to enable low priority "ticket" alerting ` alerting: name: SlothPilotSLOBudgetBurn labels: notes: "test please ignore"... [20:06:32] 10SRE-SLO: Evaluate Sloth as a possible replacement for Pyrra - https://phabricator.wikimedia.org/T404171#11441703 (10herron) [20:09:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 9.081% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [20:13:06] (03CR) 10A smart kitten: "Yeah, it seems okay to me now from that perspective following the update to the commit message. Thanks for the change!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/956060 (https://phabricator.wikimedia.org/T340697) (owner: 10Func) [20:14:57] FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [20:18:54] !log urbanecm@deploy2002 mwscript-k8s job started: GrowthExperiments:revalidateLinkRecommendations.php --wiki=itwiki --exceptDatasetChecksums=valid_itwiki_checksums.txt --deleteNullRecommendations # T412040 [20:18:57] T412040: Add a Link: repopulate "Add a Link" suggestions for itwiki - https://phabricator.wikimedia.org/T412040 [20:19:22] !log urbanecm@deploy2002 mwscript-k8s job started: GrowthExperiments:revalidateLinkRecommendations.php --wiki=itwiki --exceptDatasetChecksums=valid_itwiki_checksums.txt --deleteNullRecommendations --verbose # T412040 [20:21:17] 06SRE, 06collaboration-services, 13Patch-For-Review, 05PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11441790 (10ATitkov) Also we were asked to disallow public access to the wikipedia25 website until 08:30 UTC 15 Jan 2026, which is when the p... [20:24:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web releases routed via main at codfw: 12.36% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [20:32:59] 06SRE, 06collaboration-services, 13Patch-For-Review, 05PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11441842 (10ATitkov) @Dzahn are there any updates for wikipedia25.org/? Is there an estimate of when it can be connected and ready for things... [20:34:17] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T410589)', diff saved to https://phabricator.wikimedia.org/P86456 and previous config saved to /var/cache/conftool/dbconfig/20251208-203417-ladsgroup.json [20:34:21] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [20:38:35] (03PS1) 10Sbisson: Article search: surface nominated collections (JSON files) [extensions/ContentTranslation] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216643 (https://phabricator.wikimedia.org/T408842) [20:49:25] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P86457 and previous config saved to /var/cache/conftool/dbconfig/20251208-204924-ladsgroup.json [20:56:17] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 09 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/ContentTranslation] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216643 (https://phabricator.wikimedia.org/T408842) (owner: 10Sbisson) [21:00:04] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T2100). Please do the needful. [21:00:04] katherine_g, danisztls, and kemayo: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [21:00:07] o/ [21:00:09] * urbanecm around [21:00:11] o/ [21:00:33] o/ [21:00:35] let's get started :) [21:00:48] (03CR) 10Urbanecm: [C:03+2] Add instrumentation for mobile section switching [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216608 (https://phabricator.wikimedia.org/T410319) (owner: 10DLynch) [21:00:49] (03CR) 10Urbanecm: [C:03+2] Edit full page: Tweak skeleton appearance and fix scroll offsets [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216609 (owner: 10DLynch) [21:00:55] (03CR) 10Urbanecm: [C:03+2] Set full page scroll to 130px [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216611 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [21:01:15] Kemayo: starting CI for your backports, leaving the i18n touching one for later, as it will trigger i18n cache rebuild [21:01:27] (03CR) 10Urbanecm: [C:03+2] Ensure images are fixed size on mobile while loading [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216612 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [21:01:38] Sure, I don’t mind leaving that to last. [21:01:56] katherine_g: earlier, you mentioned your patch requires a maint script. would you mind giving some details for that, please? [21:02:13] Kemayo: is the config patch depending on the backports, please? [21:02:22] It does not. [21:02:28] (03CR) 10Urbanecm: [C:03+2] DiscussionTools: turn on automatic topic subscriptions for all editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201338 (https://phabricator.wikimedia.org/T290778) (owner: 10DLynch) [21:02:30] great, thanks [21:03:25] (03Merged) 10jenkins-bot: DiscussionTools: turn on automatic topic subscriptions for all editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201338 (https://phabricator.wikimedia.org/T290778) (owner: 10DLynch) [21:03:47] (03PS2) 10DDesouza: Partially undeploy 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) [21:03:48] urbanecm: yes the maint script is to backfill ORES scores see here: https://gerrit.wikimedia.org/g/mediawiki/extensions/ORES/+/8cac74e6df8af395574352abe9339e63af534270/maintenance/PopulateDatabase.php [21:03:49] (03CR) 10Urbanecm: [C:03+2] Partially undeploy 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) (owner: 10DDesouza) [21:04:00] urbanecm: mwscript-k8s --comment="T409438" -- extensions/ORES/maintenance/PopulateDatabase.php --wiki=thwiki [21:04:01] T409438: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438 [21:04:04] katherine_g: and it can be started only after the config patch is done? or can it be started before? [21:04:21] urbanecm: yeah only after the config patch is done [21:04:27] ack, makes sense [21:04:29] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) (owner: 10DDesouza) [21:04:33] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P86458 and previous config saved to /var/cache/conftool/dbconfig/20251208-210432-ladsgroup.json [21:04:37] (03Merged) 10jenkins-bot: Partially undeploy 2025 Global Readers Survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216600 (https://phabricator.wikimedia.org/T410918) (owner: 10DDesouza) [21:04:58] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1216600|Partially undeploy 2025 Global Readers Survey (T410918)]], [[gerrit:1201338|DiscussionTools: turn on automatic topic subscriptions for all editors (T290778)]] [21:05:03] T410918: Deploy 2025 Global Readers Surveys (non-English) - https://phabricator.wikimedia.org/T410918 [21:05:04] T290778: [Config change] Enable automatic topic subscriptions in all editing interfaces - https://phabricator.wikimedia.org/T290778 [21:07:00] !log urbanecm@deploy2002 dani, urbanecm, kemayo: Backport for [[gerrit:1216600|Partially undeploy 2025 Global Readers Survey (T410918)]], [[gerrit:1201338|DiscussionTools: turn on automatic topic subscriptions for all editors (T290778)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:07:10] Kemayo: danisztls: can you test, please? :) [21:07:52] urbanecm: yes [21:09:10] urbanecm: looks good [21:09:16] ty [21:09:21] Kemayo: what about you? [21:09:29] urbanecm: Looks good. [21:09:34] ty [21:09:35] !log urbanecm@deploy2002 dani, urbanecm, kemayo: Continuing with sync [21:09:54] FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [21:12:18] (03Merged) 10jenkins-bot: Add instrumentation for mobile section switching [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216608 (https://phabricator.wikimedia.org/T410319) (owner: 10DLynch) [21:12:19] (03Merged) 10jenkins-bot: Edit full page: Tweak skeleton appearance and fix scroll offsets [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216609 (owner: 10DLynch) [21:12:20] (03Merged) 10jenkins-bot: Set full page scroll to 130px [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216611 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [21:12:45] (03Merged) 10jenkins-bot: Ensure images are fixed size on mobile while loading [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216612 (https://phabricator.wikimedia.org/T411669) (owner: 10DLynch) [21:13:38] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216600|Partially undeploy 2025 Global Readers Survey (T410918)]], [[gerrit:1201338|DiscussionTools: turn on automatic topic subscriptions for all editors (T290778)]] (duration: 08m 40s) [21:13:43] T410918: Deploy 2025 Global Readers Surveys (non-English) - https://phabricator.wikimedia.org/T410918 [21:13:43] T290778: [Config change] Enable automatic topic subscriptions in all editing interfaces - https://phabricator.wikimedia.org/T290778 [21:16:16] urbanecm: thanks! [21:16:21] np [21:16:47] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1216608|Add instrumentation for mobile section switching (T410319)]], [[gerrit:1216609|Edit full page: Tweak skeleton appearance and fix scroll offsets]], [[gerrit:1216611|Set full page scroll to 130px (T411669)]], [[gerrit:1216612|Ensure images are fixed size on mobile while loading (T411669)]] [21:16:52] T410319: Add event tracking to the full-page editing button T409990 will introduce - https://phabricator.wikimedia.org/T410319 [21:16:53] T411669: Adjust transition upon tapping "Edit full page" - https://phabricator.wikimedia.org/T411669 [21:17:30] urbanecm: I don't have anything to test for these, since the feature is gated behind a config change that'll go out later this week. [21:17:37] makes sense [21:18:40] !log urbanecm@deploy2002 kemayo, urbanecm: Backport for [[gerrit:1216608|Add instrumentation for mobile section switching (T410319)]], [[gerrit:1216609|Edit full page: Tweak skeleton appearance and fix scroll offsets]], [[gerrit:1216611|Set full page scroll to 130px (T411669)]], [[gerrit:1216612|Ensure images are fixed size on mobile while loading (T411669)]] synced to the testservers (see https://wikitech.wikimedia.org/w [21:18:40] iki/Mwdebug). Changes can now be verified there. [21:19:40] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T410589)', diff saved to https://phabricator.wikimedia.org/P86459 and previous config saved to /var/cache/conftool/dbconfig/20251208-211940-ladsgroup.json [21:19:43] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [21:19:54] RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [21:19:57] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance [21:20:04] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db2178 (T410589)', diff saved to https://phabricator.wikimedia.org/P86460 and previous config saved to /var/cache/conftool/dbconfig/20251208-212004-ladsgroup.json [21:20:25] !log urbanecm@deploy2002 kemayo, urbanecm: Continuing with sync [21:20:48] (03PS5) 10Kgraessle: Enable revertrisk filters in thwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207923 (https://phabricator.wikimedia.org/T409438) [21:22:15] (03CR) 10Urbanecm: [C:03+2] Enable revertrisk filters in thwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207923 (https://phabricator.wikimedia.org/T409438) (owner: 10Kgraessle) [21:23:13] (03Merged) 10jenkins-bot: Enable revertrisk filters in thwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207923 (https://phabricator.wikimedia.org/T409438) (owner: 10Kgraessle) [21:24:24] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216608|Add instrumentation for mobile section switching (T410319)]], [[gerrit:1216609|Edit full page: Tweak skeleton appearance and fix scroll offsets]], [[gerrit:1216611|Set full page scroll to 130px (T411669)]], [[gerrit:1216612|Ensure images are fixed size on mobile while loading (T411669)]] (duration: 07m 36s) [21:24:28] T410319: Add event tracking to the full-page editing button T409990 will introduce - https://phabricator.wikimedia.org/T410319 [21:24:29] T411669: Adjust transition upon tapping "Edit full page" - https://phabricator.wikimedia.org/T411669 [21:25:01] (03CR) 10Urbanecm: [C:03+2] Add i18n for edit full page button [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216610 (owner: 10DLynch) [21:25:04] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1207923|Enable revertrisk filters in thwiki (T409438)]] [21:25:08] T409438: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438 [21:25:18] urbanecm: Same "can't test until later" applies to this i18n patch. [21:25:26] ack, thanks for the headsup [21:27:05] !log urbanecm@deploy2002 urbanecm, kgraessle: Backport for [[gerrit:1207923|Enable revertrisk filters in thwiki (T409438)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:27:41] katherine_g: can you test the patch, please? [21:28:03] i'll start the maint script once it finishes [21:28:10] urbanecm: testing now [21:30:07] urbanecm: everything looks good [21:30:12] !log urbanecm@deploy2002 urbanecm, kgraessle: Continuing with sync [21:30:15] perfect, thanks [21:30:23] (03PS2) 10Bking: wdqs: Add new hosts [puppet] - 10https://gerrit.wikimedia.org/r/1216607 (https://phabricator.wikimedia.org/T411731) [21:34:16] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1207923|Enable revertrisk filters in thwiki (T409438)]] (duration: 09m 11s) [21:34:19] T409438: Enable revertrisk filters in thwiki - https://phabricator.wikimedia.org/T409438 [21:34:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216610 (owner: 10DLynch) [21:35:07] !log urbanecm@deploy2002 mwscript-k8s job started: ORES:PopulateDatabase --wiki=thwiki # T409438 [21:35:15] katherine_g: ^^, fyi [21:35:30] i'm seeing a bunch of RevisionNotScorable errors [21:35:43] is that expected? [21:36:00] (03Merged) 10jenkins-bot: Add i18n for edit full page button [extensions/VisualEditor] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1216610 (owner: 10DLynch) [21:36:03] urbanecm: thanks, yeah we see that alot on liftwing [21:36:22] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1216610|Add i18n for edit full page button]] [21:36:32] just double checking it's not a reason for concern [21:36:35] ty [21:49:31] !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply [21:49:35] !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply [21:50:57] katherine_g: it seems to be finished https://www.irccloud.com/pastebin/Ox8Tr2EW/ [21:51:00] anything else? [21:51:14] urbanecm: no, thank you so much! [21:51:54] any time! [21:58:43] (03CR) 10MonAx the Developer: "@superpes15.itwiki@gmail.com, a week has passed, no objections since the change was proposed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15) [22:00:05] Reedy, sbassett, Maryum, and manfredi: OwO what's this, a deployment window?? Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251208T2200). nyaa~ [22:00:47] * urbanecm is still waiting on a full image build [22:01:35] (03CR) 10Gehel: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1216607 (https://phabricator.wikimedia.org/T411731) (owner: 10Bking) [22:10:01] !log urbanecm@deploy2002 kemayo, urbanecm: Backport for [[gerrit:1216610|Add i18n for edit full page button]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:10:02] (03PS3) 10Harej: Update dumps mirror Hieradata to reflect Scatter's new hostname and IP address [puppet] - 10https://gerrit.wikimedia.org/r/1216652 [22:10:45] !log urbanecm@deploy2002 kemayo, urbanecm: Continuing with sync [22:16:40] FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:16:53] !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/echoserver: apply [22:17:14] !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/echoserver: apply [22:17:51] (03PS4) 10Harej: Update dumps mirror Hieradata to reflect Scatter's new hostname and IP address [puppet] - 10https://gerrit.wikimedia.org/r/1216652 (https://phabricator.wikimedia.org/T409006) [22:18:24] (03CR) 10CI reject: [V:04-1] Update dumps mirror Hieradata to reflect Scatter's new hostname and IP address [puppet] - 10https://gerrit.wikimedia.org/r/1216652 (https://phabricator.wikimedia.org/T409006) (owner: 10Harej) [22:20:23] (03PS5) 10Harej: Update dumps mirror Hieradata to reflect Scatter's new hostname and IP address [puppet] - 10https://gerrit.wikimedia.org/r/1216652 (https://phabricator.wikimedia.org/T409006) [22:20:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [22:20:55] (03CR) 10Bking: [C:03+2] wdqs: Add new hosts [puppet] - 10https://gerrit.wikimedia.org/r/1216607 (https://phabricator.wikimedia.org/T411731) (owner: 10Bking) [22:21:24] (03PS1) 10SBassett: Set CSP Report Only mode for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) [22:22:54] (03CR) 10SBassett: [C:04-2] "Hold for config deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [22:23:17] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1216610|Add i18n for edit full page button]] (duration: 46m 55s) [22:23:30] finally [22:23:54] (03CR) 10Urbanecm: Set CSP Report Only mode for group1 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [22:24:19] !log `ryankemper@wdqs1015:~$ sudo systemctl restart wdqs-blazegraph` [22:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:24:38] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, December 10 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [22:25:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [22:26:23] (03PS2) 10SBassett: Set CSP Report Only mode for group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) [22:26:36] (03CR) 10SBassett: Set CSP Report Only mode for group1 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216660 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [22:28:05] PROBLEM - Ensure acme-chief-api is running on acmechief2002 is CRITICAL: PROCS CRITICAL: 2 processes with args /usr/bin/uwsgi --die-on-term --ini /etc/uwsgi/apps-enabled/acme-chief.ini https://wikitech.wikimedia.org/wiki/Acme-chief [22:30:05] RECOVERY - Ensure acme-chief-api is running on acmechief2002 is OK: PROCS OK: 1 process with args /usr/bin/uwsgi --die-on-term --ini /etc/uwsgi/apps-enabled/acme-chief.ini https://wikitech.wikimedia.org/wiki/Acme-chief [22:30:06] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11442311 (10RKemper) >>! In T411919#11440542, @Jclark-ctr wrote: > @RKemper This server is out of warranty. I believe we might have a spare battery in stock. Is there any chan... [22:37:39] FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-drmrs and Hurricane Electric (2001:7f8:54:5::13) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown [22:54:51] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/0 (Peering: France IX (FRXMRS-10G-2198) {#D0071}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [22:55:13] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:01:33] (03PS2) 10MacFan4000: ExtensionDistributor: mark 1.45 as stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216674 (https://phabricator.wikimedia.org/T408482) [23:07:18] FIRING: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [23:13:52] (03PS1) 10Papaul: Comment out temporarily the anycast ranges [homer/public] - 10https://gerrit.wikimedia.org/r/1216677 (https://phabricator.wikimedia.org/T408892) [23:21:43] FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [23:22:18] RESOLVED: [2x] ProbeDown: Service wdqs1013:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [23:24:51] RESOLVED: CoreRouterInterfaceDown: Core router interface down - cr2-drmrs:xe-0/1/0 (Peering: France IX (FRXMRS-10G-2198) {#D0071}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [23:26:33] 10SRE-SLO, 10EditCheck, 06Editing-team (Kanban Board), 05Goal, 07OKR-Work: Fix EditCheck's SLO metrics and create a dashboard for it - https://phabricator.wikimedia.org/T395444#11442503 (10VPuffetMichel) [23:30:13] FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:32:25] (03PS1) 10Papaul: Change cr3/4-ulsfo loopback ip's in puppet before tomorrow's maintenance window [puppet] - 10https://gerrit.wikimedia.org/r/1216679 (https://phabricator.wikimedia.org/T408892) [23:33:46] !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/echoserver: apply [23:33:52] !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/echoserver: apply [23:41:13] RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS [23:41:39] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Q2:rack/setup/install wdqs1033-1035 - https://phabricator.wikimedia.org/T411731#11442535 (10bking) a:05bking→03Jhancock.wm [23:42:33] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Q2:rack/setup/install wdqs1033-1035 - https://phabricator.wikimedia.org/T411731#11442539 (10bking) @Jhancock.wm we've added the requested info above, plus the Puppet code so the hosts should be able to provision. If we missed anyt... [23:47:06] FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [23:47:39] RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr2-drmrs and Hurricane Electric (2001:7f8:54:5::13) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown