[03:29:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [03:34:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [07:56:42] 06Traffic, 06Data-Engineering: Request for a new request dataset for caching research - https://phabricator.wikimedia.org/T401331#11200325 (10yazhuoz) Hi there! Just wanted to follow up and check if there have been any updates on this request. I’d be happy to provide any additional context or clarifications if... [08:41:00] 06Traffic, 10HaproxyKafka, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 07Essential-Work, 13Patch-For-Review: Replicate current low-message alerting from VarnishKafka - https://phabricator.wikimedia.org/T391810#11200557 (10Fabfur) Hi @BTullis sorry for the late answer, I think this fired correctly... [09:01:21] 06Traffic, 10HaproxyKafka, 10Data-Platform-SRE (2025.09.05 - 2025.09.26), 07Essential-Work, 13Patch-For-Review: Replicate current low-message alerting from VarnishKafka - https://phabricator.wikimedia.org/T391810#11200707 (10BTullis) >>! In T391810#11200557, @Fabfur wrote: >I think this fired correct... [09:47:27] hello! Would it be possible for us to do a careful rollout of this change this week maybe? we're kinda blocked on it https://gerrit.wikimedia.org/r/c/operations/puppet/+/1189132 [09:51:34] hnowlan: not much experience in this but we can check it for sure! [09:52:39] it should be pretty quickly visible if it's broken, thankfully [12:48:12] 06Traffic, 06Commons, 10MediaWiki-Uploading, 06SRE: HTTP 503 error when uploading images on Wikimedia Commons - https://phabricator.wikimedia.org/T383274#11201286 (10Aklapper) @Underbar_dk: two months later, would you know if this issue is continuing? [12:53:06] 06Traffic, 06Commons, 10MediaWiki-Uploading, 06SRE: HTTP 503 error when uploading images on Wikimedia Commons - https://phabricator.wikimedia.org/T383274#11201294 (10Underbar_dk) I still have not had the opportunity to upload images to Commons, though I sometimes have trouble making edits to large articles... [13:34:35] 06Traffic, 06Data-Engineering: Request for a new request dataset for caching research - https://phabricator.wikimedia.org/T401331#11201482 (10ssingh) >>! In T401331#11200325, @yazhuoz wrote: > Hi there! Just wanted to follow up and check if there have been any updates on this request. I’d be happy to provide a... [13:45:18] 06Traffic, 06Fundraising-Backlog, 06Fundraising-Tech-Roadmap, 10MediaWiki-extensions-CentralNotice, 06SRE: Set expiry time for GeoIP cookies - https://phabricator.wikimedia.org/T122097#11201498 (10ssingh) >>! In T122097#11185306, @AKanji-WMF wrote: > @XenoRyet and I discussed getting this into our next S... [14:01:14] 06Traffic, 10MediaWiki-Platform-Team (Radar), 07SecTeam-Processed, 07Security: SUL Integration for eventyay (Wikimania virtual event platform) - https://phabricator.wikimedia.org/T378157#11201584 (10ssingh) Like @Tgr mentioned, `jwt.exceptions.ExpiredSignatureError: Signature has expired` and `Please set a... [14:03:56] 06Traffic, 07Documentation: Document x-cache-status header on Wikitech - https://phabricator.wikimedia.org/T404654#11201591 (10ssingh) p:05Triage→03Low Yes thanks, that's a good idea and worth documenting. We will triage it soon and work on it. [14:46:01] sukhe: hey :) when I tried out the multi-dc.lua change the other day, I depooled the cp node with `sudo depool` but I was still getting requests that weren't mine, did I miss a step somewhere? [14:51:50] claime: depool without any arguments would depool everything (cdn and ats-be on codfw/drmrs non-single-BE nodes) so it is correct [14:52:20] ok so that's very weird. I should have copied the logs [14:52:30] those requests can very well be the healthchecks; where did you see the requests though? [14:52:33] claime: for how long after? [14:52:46] cdanis: A while, I was on it for more than 30m [14:52:59] sukhe: I had set up lua debug logging, saw them there [14:53:03] yeah that's more than sufficient [14:53:06] (the time) [14:53:25] they weren't PURGEs right? šŸ˜… [14:53:35] Don't think so but I'm not positive [14:54:08] If that's the case it kinda sucks because it clobbers the logs and journalctl skips lines [14:54:09] claime: which host was it? [14:54:19] cp2041 [14:54:37] ah I see it in SAL, thanks [14:54:50] I needed one that goes to codfw for local since I'm testing multi-dc [14:55:13] looking at that interval [14:56:32] https://grafana.wikimedia.org/goto/qqJiyujNg?orgId=1 [14:56:40] the depool looks good [14:56:53] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: SwitchCoreInterfaceDown (instance ssw1-f1-codfw:9804) - https://phabricator.wikimedia.org/T404946#11201792 (10cmooney) 05Open→03Resolved a:03cmooney All back up now. [14:57:08] but yeah, depooled hosts still get PURGEs so it can be that, or some transient healthchecks [14:57:25] Hmh ok [14:57:26] we can check when it happens again, but depool is good by itself. or even "depool cdn" more specifically [14:57:52] Yeah cool, will do that, I don't think I'll need the lua debug either this time [14:57:55] (if it works :P) [15:01:51] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move lvs1020 link from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T404959#11201828 (10ssingh) Thanks for the discovery and writing this up @Cmooney! No concerns from Traffic since as you mentioned it is the b... [15:17:52] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11201962 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7b0c0458-499c-4287-8c6f-8f66dccdba91) set by pt1979@cumin2002 for 2:00:00 on 1 host(s) and their... [15:19:11] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11201968 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=469fffa6-5667-4b97-b402-ebd2aefae808) set by pt1979@cumin2002 for 2:00:00 on 2 host(s) and their... [16:06:21] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move lvs1020 link from ssw1-f1-eqiad to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T404959#11202340 (10cmooney) @Jclark-ctr when you have some time can we have a look at this one? No particular rush thanks. [16:34:57] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11202432 (10elukey) >>! In T392851#11194034, @Jhancock.wm wrote: > @elukey I can wait! wasn't trying to rush you. lemme know next week and we'll take care of it... [17:32:31] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11202670 (10ssingh) >>! In T404219#11177388, @Lupascriptix wrote: > It seems T402959 is a bit quiet - haven't gotten any responses there. I suspect that SPARQL is the issue with the OpenRefine reconciliation since the non-SPARQL on... [17:54:27] 06Traffic, 07Documentation: Document x-cache-status header on Wikitech - https://phabricator.wikimedia.org/T404654#11202733 (10BCornwall) 05Open→03In progress a:03BCornwall [19:19:27] 06Traffic, 06Fundraising Tech - Chaos Crew, 06Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: ESI test string is still shipped by CentralNotice - https://phabricator.wikimedia.org/T400472#11202967 (10XenoRyet) 05Open→03Resolved