[09:26:19] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix alternatives entries in helm and kubernetes-client packages - https://phabricator.wikimedia.org/T387548#10937575 (10Jelto) I updated staging-codfw master nodes to the new `kubernetes-client131` version. `kub... [10:33:20] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10937782 (10Clement_Goubert) [11:36:59] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: thumbor isn't depooled by sre.k8s.pool-depool-cluster - https://phabricator.wikimedia.org/T397618 (10Clement_Goubert) 03NEW [12:12:10] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install aux-k8s-worker100[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T393053#10938036 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host aux-k8s-worker1008.eqiad.wmnet with OS bookworm [12:21:39] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938086 (10Jelto) [12:43:54] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install aux-k8s-worker100[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T393053#10938161 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host aux-k8s-worker1008.eqiad.wmnet with OS bookworm complete... [13:12:01] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938237 (10JMeybohm) [13:37:16] 06serviceops, 10Deployments, 13Patch-For-Review, 10Release-Engineering-Team (Radar), 07Wikimedia-production-error: httpb sometimes fails upon deployment with a HTTP 503 - https://phabricator.wikimedia.org/T380958#10938353 (10akosiaris) Judging from the lack of comments in the last 2 weeks and repeated te... [13:57:13] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938441 (10Jelto) [14:00:10] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938449 (10JMeybohm) [14:04:31] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938469 (10Jelto) [14:08:41] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938478 (10JMeybohm) [14:14:47] 06serviceops, 06Abstract Wikipedia team, 07Essential-Work, 13Patch-For-Review, 07Wikimedia-production-error: Wikifunctions orchestrator service in staging k8s cannot make network calls, gets getaddrinfo EAI_AGAIN / no healthy upstream - https://phabricator.wikimedia.org/T397341#10938487 (10akosiaris)... [14:16:40] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938493 (10Clement_Goubert) Repool command for ingress: `sudo confctl --object-type discovery select 'dnsdisc=k8s-ingress-wikikube.*,name=codfw' set/poole... [14:42:41] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938594 (10JMeybohm) Overall this did not really go as planned since we had a couple of issues: - We had to re-run wipe-cookbook since the kubernetes-clie... [14:53:19] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Fix alternatives entries in helm and kubernetes-client packages - https://phabricator.wikimedia.org/T387548#10938606 (10JMeybohm) p:05Low→03High [15:15:28] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938711 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host rdb2011.codfw.wmnet with OS bookworm [15:15:38] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938714 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host rdb2012.codfw.wmnet with OS bookworm [15:17:04] hello my operational lovelies... I've got a script I'd like to run in the coming weeks to do a rather large migration of around 13,000 LiquidThreads pages to Flow [15:17:21] I've been provided with a migration script that was, as far as i can tell, last used in anger in 2015 and has no dry run mode [15:17:43] T_T [15:17:44] LiquidThreads does not appear to benefit from a regular XML dump [15:18:48] so, that's got me wondering: is there any way to trial-run it against a clone of the site, and/or a clone of the database that I could point a local mediawiki instance at? [15:19:40] we're faaaaairly sure the script will do what it's supposed to do... but an ounce of prevention and all that [15:20:56] beta maybe? idk if LiquidThreads is enabled there [15:21:21] good point, I suppose it doesn't need to necessarily be ptwikibooks [15:22:09] You can also use the dumps from https://dumps.wikimedia.org/backup-index.html. In the past (sigh... 13 or so years ago?), I had used them to setup a clone of elwiki [15:22:50] it was a bit of work, even back then, but I did end up with a local mirror (with PII redacted ofc) of elwiki that I could toy around with. [15:22:55] I don't think that LQT is covered, unfortunately [15:23:11] as in not dumped? [15:23:15] nope [15:23:22] ouch, I was unaware of that [15:23:59] I'll double-check [15:24:30] I've been working on fixing up stuff where previous migration attempts got partially reverted so I've been spelunking the dumps a lot lately [15:24:49] and the actual enumeration of LQT threads was the only time I had to actually go and talk to the API [15:27:17] I suppose it might be possible to inject them back in to my dev wiki using another API [15:28:09] we do also have a script that can do one page at a time, so Plan A is to run that on a few candidates and see that they behave [15:28:38] pppery's done some sterling work reassuring us that the all-pages script will behave sensibly [15:30:19] sounds like there's no straightforward way to generate a local or remote database mirror, then? [15:41:01] Not that I know of, I fear. Maybe SRE Data Persistence has an idea ? Their channel on IRC is #wikimedia-data-persistence. [15:42:13] 06serviceops, 10MW-on-K8s, 10Data-Platform-SRE (2025.06.13 - 2025.07.04), 10Discovery-Search (2025.06.13 - 2025.07.04), 10MW-1.45-notes (1.45.0-wmf.4; 2025-06-03): Investigate EQIAD daily completion suggester rebuild failure - https://phabricator.wikimedia.org/T395465#10938840 (10pfischer) [15:44:12] aha, thank you for the pointer [15:51:27] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938924 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host rdb2011.codfw.wmnet with OS bookworm completed: - rdb2011 (**PASS**) - Remov... [15:51:57] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10938925 (10JMeybohm) >>! In T397148#10938493, @Clement_Goubert wrote: > Repool command for ingress: Correct command to just repool `ro` in codfw: `sudo co... [15:54:58] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938961 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host rdb2012.codfw.wmnet with OS bookworm completed: - rdb2012 (**PASS**) - Remov... [15:55:08] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938964 (10Jhancock.wm) 05Open→03Resolved a:05akosiaris→03Jhancock.wm [15:55:24] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install rdb201[12] - https://phabricator.wikimedia.org/T393121#10938968 (10Jhancock.wm) @akosiaris all done! [16:28:53] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644 (10Clement_Goubert) 03NEW [16:34:21] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Update wikikube codfw to kubernetes 1.31 - https://phabricator.wikimedia.org/T397148#10939201 (10Jelto) machinetranslation in codfw was deployed successfully. ` jelto@cumin1003:~$ sudo confctl --object-type discovery select 'dnsdisc=k8s-in... [16:37:42] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939226 (10Clement_Goubert) ` sudo lsof /var/lock/scap-global-lock COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME scap 3474473 spiderpig 4rR REG 0,25 0 2655 /run/lock/scap-global-lock ` [16:47:11] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939288 (10dancy) The lock is being held by `spiderpig-jobrunner`. I'll make some changes to improve this. In the meantime you could disable puppet and run `sudo systemctl stop spiderpig-jobrunner` to release the lock s... [16:49:25] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939290 (10Clement_Goubert) Should I document that in https://wikitech.wikimedia.org/wiki/Scap#scap_lock ? [16:51:33] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939305 (10dancy) >>! In T397644#10939290, @Clement_Goubert wrote: > Should I document that in https://wikitech.wikimedia.org/wiki/Scap#scap_lock ? My goal is to have a fix done by today to make it work the way you expec... [16:52:02] 06serviceops, 10Scap: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939306 (10Clement_Goubert) >>! In T397644#10939305, @dancy wrote: >>>! In T397644#10939290, @Clement_Goubert wrote: >> Should I document that in https://wikitech.wikimedia.org/wiki/Scap#scap_lock ? > > My goal is to hav... [17:18:35] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install build2003.codfw.wmnet - https://phabricator.wikimedia.org/T393015#10939408 (10Jhancock.wm) [17:20:45] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install build2003.codfw.wmnet - https://phabricator.wikimedia.org/T393015#10939425 (10Jhancock.wm) This server isn't getting a clean provisioning run Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sp... [17:24:54] 06serviceops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Q4:rack/setup/install build2003.codfw.wmnet - https://phabricator.wikimedia.org/T393015#10939450 (10akosiaris) >>! In T393015#10939425, @Jhancock.wm wrote: > This server isn't getting a clean provisioning run > > Traceback... [17:26:17] 06serviceops, 10Scap, 13Patch-For-Review: scap lock seems broken - https://phabricator.wikimedia.org/T397644#10939457 (10dancy) 05Open→03Resolved a:03dancy A fix has been deployed via scap 4.182.0. [17:33:17] 06serviceops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Q4:rack/setup/install build2003.codfw.wmnet - https://phabricator.wikimedia.org/T393015#10939527 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm @akosiaris I attempted BIOS and UEFI on this one but it has the same... [17:38:04] 06serviceops, 06Infrastructure-Foundations: Incorporate new arm64 host in our tooling - https://phabricator.wikimedia.org/T397653 (10akosiaris) 03NEW [17:40:13] 06serviceops, 06Infrastructure-Foundations, 06SRE, 07ARM support: Adoption of aarch64 (aka arm64) in WMF production? (SRE Summit 2022 Session) - https://phabricator.wikimedia.org/T320811#10939570 (10akosiaris) Our first arm64 server just got racked. We 'll need to figure out how to incorporate it in our to... [17:48:04] 06serviceops, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade httpd images to bullseye or bookworm - https://phabricator.wikimedia.org/T378128#10939601 (10Scott_French) The webserver-bookworm image flavour is now live in mw-debug/next, passing httpbb checks and manual kicking-of-tires by me. No err... [17:53:24] 06serviceops, 06MediaWiki-Engineering, 07User-notice: Rename pages and images to reflect migration to PHP 8.1 (Unicode 14) title-casing behavior - https://phabricator.wikimedia.org/T396903#10939610 (10Scott_French) [19:16:39] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install aux-k8s-worker100[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T393053#10939779 (10Jclark-ctr) [19:16:44] 06serviceops, 13Patch-For-Review: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245#10939780 (10Scott_French) [19:16:51] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install aux-k8s-worker100[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T393053#10939782 (10Jclark-ctr) 05Open→03Resolved [20:25:54] 06serviceops, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade httpd images to bullseye or bookworm - https://phabricator.wikimedia.org/T378128#10939939 (10Scott_French) [20:53:10] 06serviceops, 07Datacenter-Switchover: Turn down unused swift-r[ow] discovery services - https://phabricator.wikimedia.org/T376237#10940036 (10Scott_French) [22:13:14] 06serviceops, 10SRE-swift-storage, 07Datacenter-Switchover: Turn down unused swift-r[ow] discovery services - https://phabricator.wikimedia.org/T376237#10940279 (10MatthewVernon) FWIW, after today's incident we ended up with both `swift-rw` resources depooled: ` mvernon@cumin2002:~$ confctl --object-type dis... [22:20:19] 06serviceops, 10SRE-swift-storage, 07Datacenter-Switchover: Turn down unused swift-r[ow] discovery services - https://phabricator.wikimedia.org/T376237#10940304 (10Scott_French) @MatthewVernon - Ah, that's great! Yes, let's keep those pointed to failoid, then. I'll post a patch shortly to do the "manual equi... [23:22:08] 06serviceops, 10SRE-swift-storage, 07Datacenter-Switchover, 13Patch-For-Review: Turn down unused swift-r[ow] discovery services - https://phabricator.wikimedia.org/T376237#10940459 (10Scott_French)