[00:05:25] FIRING: SystemdUnitFailed: rsyslog-imfile-remedy.service on wikikube-worker1148:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:08:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1167976 [00:08:13] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1167976 (owner: 10TrainBranchBot) [00:18:38] (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167258 (owner: 10Krinkle) [00:19:29] (03Merged) 10jenkins-bot: beta: Remove beta-specific 'http' entry for wgGraphAllowedDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167258 (owner: 10Krinkle) [00:21:09] (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167259 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle) [00:21:52] !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage [00:22:17] (03Merged) 10jenkins-bot: beta: Move beta wikipedia canonical to beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167259 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle) [00:22:31] !log krinkle@deploy1003 Started scap sync-world: Backport for [[gerrit:1167259|beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318)]] [00:22:35] T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318 [00:24:31] !log krinkle@deploy1003 krinkle: Backport for [[gerrit:1167259|beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:26:24] (03PS1) 10Krinkle: beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167985 (https://phabricator.wikimedia.org/T289318) [00:27:59] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage [00:28:28] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1167976 (owner: 10TrainBranchBot) [00:29:20] !log krinkle@deploy1003 krinkle: Continuing with sync [00:34:45] !log krinkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167259|beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318)]] (duration: 12m 13s) [00:34:49] T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318 [00:39:45] RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [00:42:43] (03CR) 10TrainBranchBot: [C:03+2] "Approved by krinkle@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167985 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle) [00:43:30] (03Merged) 10jenkins-bot: beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167985 (https://phabricator.wikimedia.org/T289318) (owner: 10Krinkle) [00:43:46] !log krinkle@deploy1003 Started scap sync-world: Backport for [[gerrit:1167985|beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318)]] [00:43:53] T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318 [00:45:41] !log krinkle@deploy1003 krinkle: Backport for [[gerrit:1167985|beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:46:40] PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/c0c858105d9a6d6edb9405fa560c5bfba6e11a5808e356f3fc849e196f5c4227/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [00:47:23] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bookworm [00:47:34] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993884 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1037.eqiad.wm... [00:50:01] !log krinkle@deploy1003 krinkle: Continuing with sync [00:53:58] (03Abandoned) 10Andrew Bogott: cloudcephosd1037: update nic names for Bookworm. [puppet] - 10https://gerrit.wikimedia.org/r/1167916 (https://phabricator.wikimedia.org/T396651) (owner: 10Andrew Bogott) [00:55:16] !log krinkle@deploy1003 Finished scap sync-world: Backport for [[gerrit:1167985|beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318)]] (duration: 11m 30s) [00:55:20] T289318: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318 [00:57:34] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet [01:03:43] !log andrew@cumin2002 START - Cookbook sre.hosts.reboot-single for host cloudcephosd1038.eqiad.wmnet [01:06:40] RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [01:17:28] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1038.eqiad.wmnet [01:17:31] !log andrew@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet [01:50:45] FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [01:55:06] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [02:01:42] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:09:34] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:10:05] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:10:43] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:11:29] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:11:35] !log andrew@cumin2002 END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet [02:13:52] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet [02:17:03] !log andrew@cumin2002 START - Cookbook sre.hosts.reboot-single for host cloudcephosd1040.eqiad.wmnet [02:30:25] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1040.eqiad.wmnet [02:30:28] !log andrew@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet [03:05:25] RESOLVED: SystemdUnitFailed: rsyslog-imfile-remedy.service on wikikube-worker1148:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:13:38] !log andrew@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet [03:15:12] FIRING: [21x] CertAlmostExpired: Certificate for service asw1-b3-magru.mgmt.magru.wmnet:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:21:23] !log andrew@cumin2002 START - Cookbook sre.hosts.reboot-single for host cloudcephosd1041.eqiad.wmnet [03:35:02] !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1041.eqiad.wmnet [03:35:05] !log andrew@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet [03:54:00] 06SRE, 06serviceops, 10Wikimedia-Site-requests, 13Patch-For-Review: Change $wgMaxArticleSize limit from byte-based to character-based - https://phabricator.wikimedia.org/T275319#10993982 (10cscott) My current position is still outlined in T275319#9826396 above, and I'd love to help get some traction on tho... [03:54:40] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware), 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993983 (10Andrew) 05Open→03Resolved I upgraded the firmware on all of these. My attempts to get them to bookworm at the s... [05:03:43] FIRING: CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:03:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:03:58] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:04:08] FIRING: [19x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:06:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:16:42] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:20:27] FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:25:27] RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:35:27] FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:38:58] FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [05:39:26] (03CR) 10Ayounsi: [C:03+1] "nice! to be fully tested but the approach lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1167883 (owner: 10JHathaway) [05:40:27] RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:48:58] RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag [05:51:00] FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [05:55:06] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [05:57:59] (03PS2) 10Giuseppe Lavagetto: cache-text: remove static rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/1166798 (https://phabricator.wikimedia.org/T398668) [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250711T0600) [06:00:56] PROBLEM - Host rpki2003 is DOWN: PING CRITICAL - Packet loss = 100% [06:01:29] that should come back up ^ [06:04:10] FIRING: GanetiBGPDown: BGP session down between ganeti2034 and lsw1-a4-codfw - group Ganeti4 - https://wikitech.wikimedia.org/wiki/Ganeti#GanetiBGPDown - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=lsw1-a4-codfw:9804&var-bgp_group=Ganeti4&var-bgp_neighbor=ganeti2034 - https://alerts.wikimedia.org/?q=alertname%3DGanetiBGPDown [06:05:42] FIRING: JobUnavailable: Reduced availability for job jmx_idp in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:05:45] RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [06:06:04] RECOVERY - Host rpki2003 is UP: PING WARNING - Packet loss = 75%, RTA = 33.79 ms [06:09:10] RESOLVED: GanetiBGPDown: BGP session down between ganeti2034 and lsw1-a4-codfw - group Ganeti4 - https://wikitech.wikimedia.org/wiki/Ganeti#GanetiBGPDown - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=lsw1-a4-codfw:9804&var-bgp_group=Ganeti4&var-bgp_neighbor=ganeti2034 - https://alerts.wikimedia.org/?q=alertname%3DGanetiBGPDown [06:10:42] RESOLVED: JobUnavailable: Reduced availability for job jmx_idp in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:11:16] PROBLEM - Host rpki2003 is DOWN: PING CRITICAL - Packet loss = 100% [06:11:30] RECOVERY - Host rpki2003 is UP: PING OK - Packet loss = 0%, RTA = 33.94 ms [06:25:25] (03PS1) 10Marostegui: mariadb: Productionize es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1168036 (https://phabricator.wikimedia.org/T395771) [06:26:31] (03CR) 10Marostegui: [C:03+2] mariadb: Productionize es1048 [puppet] - 10https://gerrit.wikimedia.org/r/1168036 (https://phabricator.wikimedia.org/T395771) (owner: 10Marostegui) [06:30:10] (03PS1) 10Marostegui: db2213: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1168037 (https://phabricator.wikimedia.org/T398928) [06:30:47] (03CR) 10Marostegui: [C:03+2] db2213: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1168037 (https://phabricator.wikimedia.org/T398928) (owner: 10Marostegui) [06:31:53] !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2213.codfw.wmnet with reason: Maintenance [06:31:57] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2213 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78892 and previous config saved to /var/cache/conftool/dbconfig/20250711-063156-marostegui.json [06:39:23] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78893 and previous config saved to /var/cache/conftool/dbconfig/20250711-063922-root.json [06:54:28] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78894 and previous config saved to /var/cache/conftool/dbconfig/20250711-065428-root.json [06:55:40] (03PS1) 10Krinkle: varnish: Improve GeoIP to use cookie domain similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) [06:56:52] (03PS2) 10Krinkle: varnish: Improve GeoIP to use cookie domain similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) [06:56:54] (03CR) 10Krinkle: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) (owner: 10Krinkle) [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250711T0700) [07:08:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:08:48] FIRING: [19x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:09:34] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78895 and previous config saved to /var/cache/conftool/dbconfig/20250711-070933-root.json [07:18:42] FIRING: [21x] CertAlmostExpired: Certificate for service asw1-b3-magru.mgmt.magru.wmnet:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:18:43] RESOLVED: CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:18:43] FIRING: [19x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:23:31] 10ops-codfw, 06SRE, 06DC-Ops: Inbound errors on interface cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://phabricator.wikimedia.org/T399097#10994182 (10cmooney) The link remains down, Arelion are awaiting a replacement card for an optical system in Atlanta it seems:... [07:23:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:23:48] FIRING: [19x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:24:40] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78896 and previous config saved to /var/cache/conftool/dbconfig/20250711-072439-root.json [07:28:43] FIRING: [4x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:28:48] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:28:52] 10ops-codfw, 06SRE, 06DC-Ops: Arelion IC-374549 100G Transport outage (cr1-codfw -> cr1-eqiad) July 2025 - https://phabricator.wikimedia.org/T399097#10994194 (10cmooney) [07:33:43] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:34:44] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for addshore - https://phabricator.wikimedia.org/T399152#10994199 (10MoritzMuehlenhoff) [07:34:51] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for addshore - https://phabricator.wikimedia.org/T399152#10994201 (10MoritzMuehlenhoff) @Milimetric @Ahoelzl @Ottomata This needs your approval [07:36:48] !log jmm@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie [07:36:58] 06SRE, 06Infrastructure-Foundations: Prepare our custom installer and the base layer for Trixie - https://phabricator.wikimedia.org/T391083#10994206 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie [07:37:10] (03PS1) 10Jgiannelos: changeprop: Ignore more commons NS on pcs rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 [07:38:43] FIRING: [17x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:43:43] FIRING: [16x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:43:48] FIRING: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:43:58] FIRING: [4x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:48:43] FIRING: [10x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:48:43] RESOLVED: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:48:53] RESOLVED: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:50:09] (03PS3) 10Krinkle: varnish: Improve GeoIP to use cookie domain similar to prod [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) [07:50:58] 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group (LDAP and kerberos), for aprum - https://phabricator.wikimedia.org/T398650#10994228 (10cmooney) [07:53:43] RESOLVED: [9x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:56:22] !log jmm@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage [08:02:21] (03CR) 10Krinkle: "I'm looking at `./modules/varnish/files/tests/docker_run.sh cp1110.eqiad.wmnet 1168038` (after simulating a nearby failure on PS2) to look" [puppet] - 10https://gerrit.wikimedia.org/r/1168038 (https://phabricator.wikimedia.org/T99226) (owner: 10Krinkle) [08:02:34] !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage [08:06:45] 06SRE, 10decommission-hardware, 06Infrastructure-Foundations: decommission puppetserver2003 - https://phabricator.wikimedia.org/T398607#10994286 (10MoritzMuehlenhoff) [08:18:05] !log jmm@cumin1003 START - Cookbook sre.hosts.decommission for hosts puppetserver2003.codfw.wmnet [08:19:50] !log root@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2223.codfw.wmnet with reason: Maintenance [08:19:54] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2223 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78897 and previous config saved to /var/cache/conftool/dbconfig/20250711-081953-marostegui.json [08:20:47] jmm@cumin1003 decommission (PID 1245901) is awaiting input [08:26:45] (03PS2) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 [08:27:25] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2223 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78898 and previous config saved to /var/cache/conftool/dbconfig/20250711-082725-root.json [08:27:36] (03PS3) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [08:30:46] !log jmm@cumin1003 START - Cookbook sre.dns.netbox [08:33:13] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10994459 (10MoritzMuehlenhoff) >>! In T378028#10993697, @Dzahn wrote: > But another question comes to mind.. and that is.. do VRTS machi... [08:33:58] !log jmm@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003" [08:34:17] !log jmm@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003" [08:34:17] !log jmm@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:34:18] !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetserver2003.codfw.wmnet [08:34:27] 06SRE, 10decommission-hardware, 06Infrastructure-Foundations: decommission puppetserver2003 - https://phabricator.wikimedia.org/T398607#10994460 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin1003 for hosts: `puppetserver2003.codfw.wmnet` - puppetserver2003.codfw.wmnet (**PASS**... [08:36:08] (03PS1) 10Muehlenhoff: Remove puppetserver2003 [puppet] - 10https://gerrit.wikimedia.org/r/1168121 (https://phabricator.wikimedia.org/T398607) [08:37:36] (03CR) 10Arnaudb: [C:03+1] "lgtm! thanks for the amend! lets merge and move on to the next thing!" [puppet] - 10https://gerrit.wikimedia.org/r/1129920 (https://phabricator.wikimedia.org/T387833) (owner: 10Dzahn) [08:41:28] (03PS1) 10Ayounsi: magru: add Ufinet transit [homer/public] - 10https://gerrit.wikimedia.org/r/1168122 [08:41:45] (03PS2) 10Ayounsi: magru: add Ufinet transit [homer/public] - 10https://gerrit.wikimedia.org/r/1168122 (https://phabricator.wikimedia.org/T389767) [08:41:47] !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/changeprop: sync [08:42:06] !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync [08:42:31] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2223 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78899 and previous config saved to /var/cache/conftool/dbconfig/20250711-084230-root.json [08:45:37] (03CR) 10Elukey: [C:03+2] profile::docker::reporter: add Wikikube and ML serve prod clusters [puppet] - 10https://gerrit.wikimedia.org/r/1167885 (https://phabricator.wikimedia.org/T397696) (owner: 10Elukey) [08:51:28] !log elukey@deploy1003 helmfile [eqiad] START helmfile.d/admin 'sync'. [08:51:29] !log elukey@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'sync'. [08:51:43] (03Abandoned) 10Slyngshede: data.yaml offboarding trokhymovych [puppet] - 10https://gerrit.wikimedia.org/r/1164707 (owner: 10Slyngshede) [08:51:51] !log elukey@deploy1003 helmfile [codfw] START helmfile.d/admin 'sync'. [08:51:55] !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'sync'. [08:54:25] (03PS4) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [08:57:37] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2223 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78900 and previous config saved to /var/cache/conftool/dbconfig/20250711-085736-root.json [09:00:14] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2213 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1168124 (https://phabricator.wikimedia.org/T399280) [09:01:09] (03CR) 10Filippo Giunchedi: "Thank you for the patch -- LGTM, I'm thinking we can add raises=False to task_comment() since I don't think it is fatal if we don't commen" [cookbooks] - 10https://gerrit.wikimedia.org/r/1167887 (owner: 10Volans) [09:02:25] (03CR) 10Elukey: [C:03+2] "I think it is worth to try it, let's see how it goes and if we have to follow up or not!" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1156315 (owner: 10Hashar) [09:03:29] (03CR) 10Elukey: [C:03+2] "Done" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/1156733 (owner: 10Hashar) [09:03:39] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetserver2003 [puppet] - 10https://gerrit.wikimedia.org/r/1168121 (https://phabricator.wikimedia.org/T398607) (owner: 10Muehlenhoff) [09:03:55] (03Abandoned) 10Elukey: TEST - fix http_boot_once for reimage [cookbooks] - 10https://gerrit.wikimedia.org/r/1166378 (owner: 10Elukey) [09:04:16] !log jmm@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1003.eqiad.wmnet with OS trixie [09:04:22] 06SRE, 06Infrastructure-Foundations: Prepare our custom installer and the base layer for Trixie - https://phabricator.wikimedia.org/T391083#10994543 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie executed with errors: - srete... [09:05:27] 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware, and 2 others: decommission puppetserver2003 - https://phabricator.wikimedia.org/T398607#10994546 (10MoritzMuehlenhoff) [09:06:42] (03CR) 10Elukey: [C:03+1] I/F: simplify Phabricator usage [cookbooks] - 10https://gerrit.wikimedia.org/r/1167886 (owner: 10Volans) [09:07:53] (03PS5) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [09:09:45] FIRING: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [09:12:42] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2223 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78901 and previous config saved to /var/cache/conftool/dbconfig/20250711-091242-root.json [09:12:47] (03PS6) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [09:13:25] (03PS7) 10Jgiannelos: changeprop: Ignore more namespace on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [09:15:19] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T399280 [09:15:22] T399280: Switchover s5 master (db2192 -> db2213) - https://phabricator.wikimedia.org/T399280 [09:15:59] 06SRE, 06collaboration-services, 06Infrastructure-Foundations, 10Mail, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10994585 (10Arnoldokoth) @Dzahn We used to run it on VMs but we kept running into resource issues (especially with `clamav`) even after... [09:16:39] 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Netbox: remove old cr2-codfw Switch Control Board inventory items - https://phabricator.wikimedia.org/T398940#10994586 (10ayounsi) We can remove them from Netbox if they're not in the device anymore. and add them to the spare tracking... [09:18:13] !log marostegui@cumin1002 dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T399280', diff saved to https://phabricator.wikimedia.org/P78902 and previous config saved to /var/cache/conftool/dbconfig/20250711-091812-root.json [09:20:35] (03CR) 10Gmodena: "LGTM. Just left two nit/questions." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167438 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey) [09:21:57] jmm@cumin1003 reimage (PID 1251864) is awaiting input [09:22:14] (03PS2) 10Elukey: EventStreamConfig: add the maps.tiles_change_bookworm stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167438 (https://phabricator.wikimedia.org/T381565) [09:22:36] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db2213 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1168124 (https://phabricator.wikimedia.org/T399280) (owner: 10Gerrit maintenance bot) [09:22:52] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline" [puppet] - 10https://gerrit.wikimedia.org/r/1167157 (https://phabricator.wikimedia.org/T397003) (owner: 10Tiziano Fogli) [09:23:29] (03PS8) 10Jgiannelos: changeprop: Ignore more namespaces on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [09:24:35] (03CR) 10Elukey: "Thanks Gabriele!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1167438 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey) [09:24:45] RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqiad - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [09:25:15] !log imported perccli for trixie-wikimedia T391083 [09:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:19] T391083: Prepare our custom installer and the base layer for Trixie - https://phabricator.wikimedia.org/T391083 [09:27:49] !log jmm@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie [09:27:59] 06SRE, 06Infrastructure-Foundations: Prepare our custom installer and the base layer for Trixie - https://phabricator.wikimedia.org/T391083#10994632 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie [09:28:06] (03PS9) 10Jgiannelos: changeprop: Ignore more namespaces on pcs transclusion rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/1168041 (https://phabricator.wikimedia.org/T397072) [09:29:27] !log Starting s5 codfw failover from db2192 to db2213 - T399280 [09:29:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:30] T399280: Switchover s5 master (db2192 -> db2213) - https://phabricator.wikimedia.org/T399280 [09:30:07] !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db2213 to s5 primary T399280', diff saved to https://phabricator.wikimedia.org/P78903 and previous config saved to /var/cache/conftool/dbconfig/20250711-093006-marostegui.json [09:31:17] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2192 T399280', diff saved to https://phabricator.wikimedia.org/P78904 and previous config saved to /var/cache/conftool/dbconfig/20250711-093115-root.json