[04:39:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [04:44:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [07:41:37] 06Traffic, 06serviceops, 05WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for per-wiki, percentage-based rollout of hCaptcha - https://phabricator.wikimedia.org/T404184#11223287 (10jijiki) [08:32:30] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11223546 (10Krinkle) >>! In T403510#11207686, @Krinkle wrote: > […] I went through the top 10 Wikinews wikis […] >... [08:32:54] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11223548 (10Krinkle) [09:41:44] 06Traffic: Move host normalization to haproxy - https://phabricator.wikimedia.org/T392880#11223895 (10Fabfur) 05In progress→03Resolved [14:01:58] 06Traffic: purged event lag keeps piling up in codfw topics after switchover - https://phabricator.wikimedia.org/T389707#11224729 (10ssingh) @Fabfur: Is there any follow-up to do here or can we close this given the issue at hand was resolved? Note that in the recent switchover, we didn't observe this issue. [14:02:55] 06Traffic, 06SRE: "Backend fetch failed" on edit save - https://phabricator.wikimedia.org/T382790#11224731 (10ssingh) 05Open→03Resolved a:03ssingh This has been open for a while and there hasn't been any follow up from either side. @MGChecker: Please re-open if this issue still persists for you. Thanks! [14:04:51] 06Traffic: purged event lag keeps piling up in codfw topics after switchover - https://phabricator.wikimedia.org/T389707#11224746 (10Fabfur) 05Open→03Resolved Yep, definitely, thanks for reminding [14:07:07] 06Traffic, 13Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154#11224764 (10ssingh) Because of the regressions observed in OpenSSL 3.x, we never finished the upgrade for the `cp` hosts to bookworm. This task is stalled while we go through the possible upgrade paths... [14:09:20] 06Traffic, 06SRE, 10SRE-swift-storage: Rise in ms-fe2* TCP retransmits since 11:40 UTC today - https://phabricator.wikimedia.org/T367056#11224783 (10ssingh) There has been no follow-up on this after Jun 2024. @MatthewVernon: should we keep this open? [14:11:58] 06Traffic, 06Infrastructure-Foundations, 06SRE: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps - https://phabricator.wikimedia.org/T359054#11224790 (10ssingh) 05Open→03Resolved a:03ssingh We have made progress in T301605, and specific to this task, we ramped up tra... [14:16:27] 06Traffic: Investigate setting init_on_alloc=0 on cache hosts - https://phabricator.wikimedia.org/T401025#11224809 (10ssingh) p:05Triage→03Low This is worth investigating/researching but I am marking this as "Low" and we can revisit it for Q3 2025-2026. [14:22:03] 06Traffic: Bump prometheus-rdkafka-exporter go version - https://phabricator.wikimedia.org/T400986#11224840 (10Fabfur) This has been [[ https://gitlab.wikimedia.org/repos/sre/prometheus-rdkafka-exporter | migrated to Gitlab ]] in the meantime [14:26:15] 06Traffic: Bump prometheus-rdkafka-exporter go version - https://phabricator.wikimedia.org/T400986#11224869 (10Fabfur) [14:36:55] 06Traffic: Consider using Intel Xeon CPUs with QAT - https://phabricator.wikimedia.org/T400324#11224895 (10ssingh) The new `cp` hosts in `codfw`, `Intel(R) Xeon(R) 6730P`, support QAT, so we can experiment there and see if that helps. [14:38:01] 06Traffic, 06Security-Team, 10WMF-General-or-Unknown, 07ContentSecurityPolicy, 13Patch-For-Review: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618#11224896 (10ssingh) Commenting from Traffic's side: this is in some ways, a trivial patch for us because we are simply... [14:39:16] 06Traffic, 06SRE, 10SRE-swift-storage: Rise in ms-fe2* TCP retransmits since 11:40 UTC today - https://phabricator.wikimedia.org/T367056#11224904 (10MatthewVernon) I guess not, if it has recurred, it's not been enough to page... [14:40:11] 06Traffic, 06SRE, 10SRE-swift-storage: Rise in ms-fe2* TCP retransmits since 11:40 UTC today - https://phabricator.wikimedia.org/T367056#11224908 (10ssingh) 05Open→03Resolved a:03ssingh OK thank you. I am marking this as resolved for now. We can re-open as required. [14:42:25] 06Traffic: ncredir sometimes receives large traffic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947#11224914 (10ssingh) 05Open→03Resolved a:03ssingh Paging has been disabled on this alert since July 2025. But even other than that, looking at the 90-day view, this has not ha... [14:54:19] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11224989 (10ssingh) Awaiting a response from @Lupascriptix before we can triage it again. [14:55:43] 06Traffic, 10DNS, 06SRE, 06Traffic-Icebox, and 2 others: Many misc wikis lack mobile domains - https://phabricator.wikimedia.org/T152882#11224991 (10JTweed-WMF) [14:56:15] 06Traffic: ncmonitor should not submit new CRs if there are still some yet to be reviewed - https://phabricator.wikimedia.org/T368694#11224995 (10ssingh) My two cents: I think like you mention above, is this really required? I think this may be more work and doesn't give us a tangible benefit? Put differently I... [14:58:10] 06Traffic: varnish wikimedia_trust ACL isn't used anymore - https://phabricator.wikimedia.org/T399688#11225003 (10ssingh) a:03BCornwall [14:58:42] 06Traffic, 06Data-Engineering, 13Patch-For-Review: improved x-analytics data on Edge Uniques status - https://phabricator.wikimedia.org/T405783#11225004 (10Ottomata) > wmfuniq_freq drive by nit: if you are storing percentages, why not just store the actual percentage rather than the percentage / 10? E.g. v... [15:04:35] 06Traffic, 06serviceops, 06SRE: Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800#11225029 (10ssingh) This is being moved on the Traffic workboard to "Radar/Not for service" as I don't think there is anything on our end to do here. Please let me know if you... [15:04:54] 06Traffic, 06serviceops, 06SRE: Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800#11225030 (10ssingh) And to be clear, by that I mean that this change is better suited for MW and not the CDN. [15:08:17] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11225041 (10Jhancock.wm) Update on the cp2056. Finally got Dell to agree to send a replacement card after a week of back of forth and escalations. So that shoul... [15:09:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.38:8082 @ druid1008 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=druid_public - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:10:04] ^ decommissioning [15:24:59] 06Traffic, 10Page Content Service, 06serviceops: Block traffic to RESTBase /page/talk endpoint and sunset it - https://phabricator.wikimedia.org/T401895#11225095 (10MSantos) [15:29:13] 06Traffic, 06SRE, 13Patch-For-Review, 07Wikimedia-Performance-recommendation: ATS Read While Writer feature is wrongly configured - https://phabricator.wikimedia.org/T315911#11225132 (10ssingh) What is the update on this, given that it has been a while and I am a bit confused reading the text and trying to... [15:29:14] 06Traffic, 10DNS: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952#11225133 (10JAbrams) Hi @ssingh, I hope you are doing well :) Apologies for the delay! For consistency, we’ll use the URL **trustandsafety... [15:30:55] 06Traffic: ncmonitor should not submit new CRs if there are still some yet to be reviewed - https://phabricator.wikimedia.org/T368694#11225139 (10BCornwall) The goal is to have ncmonitor run much more frequently than it currently is (e.g. once a day rather than once a week) so that we can accelerate the turnarou... [15:32:49] 06Traffic: ncmonitor should not submit new CRs if there are still some yet to be reviewed - https://phabricator.wikimedia.org/T368694#11225143 (10ssingh) >>! In T368694#11225139, @BCornwall wrote: > The goal is to have ncmonitor run much more frequently than it currently is (e.g. once a day rather than once a we... [15:36:45] 06Traffic, 06SRE: Wikidough: Support EDNS(0) Padding: RFC 7830 and RFC 8467 - https://phabricator.wikimedia.org/T274431#11225153 (10ssingh) 05Open→03Resolved a:03ssingh We have had this for a while and the responses are padded. Marking as resolved. [15:38:21] 06Traffic, 06SRE: Performance implications of buffer sizes in Apache Traffic Server intercept plugins - https://phabricator.wikimedia.org/T287847#11225162 (10ssingh) 05Open→03Resolved a:03ssingh This was merged upstream in 9.2.x so we have inherited this change. Since we have not revisited this since... [15:42:32] 06Traffic, 10Page Content Service, 06serviceops: Block traffic to RESTBase /page/talk endpoint and sunset it - https://phabricator.wikimedia.org/T401895#11225201 (10MSantos) p:05Triage→03Medium @akosiaris per T392491#11167986 and message sent to Wikitech-l earlier Today, this is ready to go. [15:44:10] 06Traffic, 10conftool, 07Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009#11225214 (10ssingh) I am curious: should we keep this open or should this be resolved now given that we have requestctl.wikimedia.org and that takes care o... [15:52:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11225279 (10Papaul) [16:07:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11225393 (10Papaul) I had a meeting today with @Jgreen about the new switch configuration. what we will be doing is to move the... [16:12:27] 06Traffic, 06MediaWiki-Platform-Team (Radar): Share CDN cache between m-dot and unified mobile routing - https://phabricator.wikimedia.org/T405931 (10Krinkle) 03NEW [16:20:03] 06Traffic: Fetching mediawiki GPG keys fail with error "No data" - https://phabricator.wikimedia.org/T405165#11225510 (10ssingh) I don't think this has anything to do with the UA. Is the command correct? Maybe try downloading via `curl`/`wget` and then piping to `gpg --import '`. `curl https://www.mediawiki.org/... [16:22:01] 06Traffic, 06MediaWiki-Platform-Team (Radar): Share CDN cache between m-dot and unified mobile routing - https://phabricator.wikimedia.org/T405931#11225553 (10Krinkle) @BBlack @ssingh @bcampbell I think a reasonable may be: //Why not just redirect m-dot to standard and forget about the m-dot cache?// The [o... [16:25:54] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11225573 (10elukey) The firmware cookbook doesn't work yet since spicerack is configured to look for a `HttpPushUri` field in the Redfish's UpdateService endpoi... [17:57:01] 06Traffic: Fetching mediawiki GPG keys fail with error "No data" - https://phabricator.wikimedia.org/T405165#11226025 (10Ciencia_Al_Poder) >>! El T405165#11225510, @ssingh escribió: > I don't think this has anything to do with the UA. Is the command correct? Maybe try downloading via `curl`/`wget` and then pipin... [18:20:13] 06Traffic: Fetching mediawiki GPG keys fail with error "No data" - https://phabricator.wikimedia.org/T405165#11226086 (10ssingh) >>! In T405165#11226025, @Ciencia_Al_Poder wrote: >>>! El T405165#11225510, @ssingh escribió: >> I don't think this has anything to do with the UA. Is the command correct? Maybe try do... [18:45:51] 06Traffic: Fetching mediawiki GPG keys fail with error "No data" due to User-Agent requirement - https://phabricator.wikimedia.org/T405165#11226184 (10Aklapper) [19:13:32] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226332 (10Lupascriptix) Hey there, sorry about the delay, I was away from the office for a bit. The folks at T402959 don't think this issue is due to the SPARQL queries, but I just checked and I'm still having the same issues as... [19:16:45] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226366 (10ssingh) >>! In T404219#11226332, @Lupascriptix wrote: > Hey there, sorry about the delay, I was away from the office for a bit. The folks at T402959 don't think this issue is due to the SPARQL queries, but I just checke... [19:25:22] 06Traffic, 13Patch-For-Review: Fetching mediawiki GPG keys fail with error "No data" due to User-Agent requirement - https://phabricator.wikimedia.org/T405165#11226414 (10Dzahn) @Ciencia_Al_Poder When you actually download the MediaWiki tarballs, as opposed to the GPG keys, would I be right in assuming they co... [19:42:01] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226595 (10Lupascriptix) I admittedly don't know enough of what OpenRefine is doing under the hood to get the query path, but I have the json for the reconciliation call if that helps. I can also share the python script that's err... [20:12:56] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226770 (10ssingh) >>! In T404219#11226595, @Lupascriptix wrote: > I admittedly don't know enough of what OpenRefine is doing under the hood to get the query path, but I have the json for the reconciliation call if that helps. I c... [20:16:57] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226776 (10Lupascriptix) Here's the OpenRefine reconciliation json: { "op": "core/recon", "engineConfig": { "facets": [ { "type": "list", "name": "Author: judgment", "expressi... [20:23:10] 06Traffic: 403 errors with user-agent - https://phabricator.wikimedia.org/T404219#11226793 (10Lupascriptix) And the python script using qwikidata: def demographics(row: pd.Series): q = row['qno'] user_agent = '[My university] ([url; my email address])' url = f"https://www.wikidata.org/wi... [20:40:41] 06Traffic, 13Patch-For-Review: varnish wikimedia_trust ACL isn't used anymore - https://phabricator.wikimedia.org/T399688#11226860 (10BCornwall) Were the ulsfo-specific entries put in place for testing initial implementation? Not sure why they were there. [20:41:04] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11226861 (10Papaul) 05Open→03Resolved We decided that the next time we have a n open window again for this @cmooney will himself drive the test. For this test, the m... [20:43:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11226867 (10Papaul) [20:44:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11226869 (10Papaul) p:05Triage→03Medium [21:01:04] 06Traffic, 13Patch-For-Review: Fetching mediawiki GPG keys fail with error "No data" due to User-Agent requirement - https://phabricator.wikimedia.org/T405165#11226922 (10Ciencia_Al_Poder) Yes, tarballs and signatures are downloaded from releases.wikimedia.org I guess I should be trusting individual fingerpri... [21:16:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11226958 (10Papaul) [21:25:06] 06Traffic, 10DNS, 13Patch-For-Review: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952#11226971 (10BCornwall) 05Stalled→03In progress [21:30:56] 06Traffic, 13Patch-For-Review: varnish wikimedia_trust ACL isn't used anymore - https://phabricator.wikimedia.org/T399688#11226998 (10BCornwall) Also, do we want to re-add some of the IPs in a different ACL? PCC is showing a number of ranges that would be removed, for instance: ` - wikimedia_trust => ['208... [21:43:22] sukhe: brett: I'd like to start implementing either adding m-to-standard redirects or consolidating caches this week, but I suppose we should decide once more which way we want to do it. With the latest data, I lean toward the simpler approach of redirecting, but two weeks ago bblack and I still thought consolidating first was worthwhile. Happy to go either way! I summarised it at https://phabricator.wikimedia.org/T405931#11225551, [21:43:22] let me know :) [21:45:46] Thanks! I'd say go simplest [21:46:14] Krinkle: <2% traffic to m-dot is surprising [21:47:30] Yeah, it is. I guess less so in connection with Google no longer having a separate mobile_url in its index so any traffic from there was hitting our redirect and thus naturally flows to standard domain as it already did post-Jan 2024. [21:47:47] But yeah, I figured other referal traffic and search engines might've had more direct references cached. [21:49:40] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Introduce known-client identity objects and integrate with requestctl - https://phabricator.wikimedia.org/T403220#11227088 (10Scott_French) [21:51:00] The biggest surprise to me, entirely unrelated, is that the 301 wikis last week amounted to only 1/10th of the traffic of the first 10 wikis. Daily mobile user page views: Wikibooks (0.6M), Wikiquote (0.4M), Wikivoyage/Wikiversity/Wikinews (~0.2M). Compared to Italian Wikipedia alone was 10M views. So last week brought us from 6.3 to 6.9% direct mobile pageviews (vs mobile redirects). [21:51:12] but also from 10 wikis to 311 wikis [21:51:26] ref https://phabricator.wikimedia.org/T405429#11221830 [22:19:11] fascinating [22:20:04] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11227275 (10Krinkle) [23:08:50] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11227423 (10Krinkle) [23:14:53] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11227446 (10Krinkle) [23:16:57] 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11227447 (10Krinkle)