[06:35:58] 06Traffic, 06SRE, 13Patch-For-Review, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11188113 (10Joe) 05Open→03Resolved I will tentatively close this task for now. [06:58:00] 06Traffic, 10Hiddenparma, 06SRE: Better mapping of requests coming from datacenters/clouds - https://phabricator.wikimedia.org/T400120#11188127 (10SLyngshede-WMF) 05Open→03Resolved a:03SLyngshede-WMF [07:41:39] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11188195 (10elukey) For cp2050 I keep getting this: ` GET https://10.193.3.234/redfish/v1/TaskService/TaskMonitors/JID_580944559377 returned HTTP 400 Response... [08:26:06] 06Traffic, 06serviceops, 10WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for per-wiki, percentage-based rollout of hCaptcha - https://phabricator.wikimedia.org/T404184#11188346 (10kostajh) @jijiki suggested that we consider using an approach similar to [[ https://codesearch... [08:28:05] 06Traffic, 06serviceops, 10WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for per-wiki, percentage-based rollout of hCaptcha - https://phabricator.wikimedia.org/T404184#11188350 (10kostajh) [08:50:51] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11188395 (10elukey) cp2051 worked, cp2052 showed the issue, cp2053 worked. [09:43:11] 06Traffic, 10Hiddenparma, 06SRE: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826 (10Joe) 03NEW [09:48:36] Hi, stupid question. What is our policy on typo dns records? a colleague is asking whether it's okay to set up redirect of list.wikimedia.org to lists.wikimedia.org (listS). I thought of setting up CNAME but that feels wrong. should it go to ncredir? [09:49:01] and whether it's okay to do it at all [09:56:05] Amir1: I think it should go to ncredir, about the policy I don't know [10:49:32] 06Traffic, 10Hiddenparma, 06SRE: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826#11188866 (10SLyngshede-WMF) Personally I don't love the private repository with Puppet code inside it, as it hides a lot of information. I get that this is the idea, but it mak... [11:00:36] 06Traffic, 10Hiddenparma, 06SRE: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826#11188888 (10MoritzMuehlenhoff) It's worth mentioning that starting next quarter we'll start work on moving the user data currently defined in data.yaml to a private repository,... [11:06:37] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11188900 (10elukey) Done up to cp2058, all good (excluding cp2056 as requested). Next steps: - Upgrade firmwares - Check why the cookbook didn't run on cp2052... [11:18:32] heya - we were looking at the ats/lua rewrite oddness of the multi-dc change from yesterday and we think the issue is that we were using the remap rule to rewrite the header when we actually want to get the (already rewritten) host header from the client request [11:18:51] I've filed https://gerrit.wikimedia.org/r/1189132 to outline what we want to do. Does the logic check out? and are the concerns valid? [13:06:52] FIRING: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [13:11:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [15:28:43] FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [15:33:43] FIRING: [9x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [15:38:43] FIRING: [9x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [15:43:43] RESOLVED: [9x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [16:00:52] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11190457 (10RobH) [16:03:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11190471 (10RobH) [16:10:05] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cp20[43-58] codfw - https://phabricator.wikimedia.org/T392851#11190545 (10elukey) [16:10:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11190551 (10RobH) [16:34:17] Amir1: There isn't much precedence for subdomain redirects so there isn't a policy per se.... but ncredir can indeed handle that! [16:34:58] whether it *should* is in the eye of the beholder [17:14:24] 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11190795 (10BCornwall) [17:17:49] brett: does tomorrow work for wdqs lvs teardown? [17:18:32] ryankemper: Sure! [17:18:49] cool i'll throw something down on our calendars [17:19:15] thanks! [17:22:03] hnowlan: A lot of traffic members are out this week so not as many eyes to look at that. [17:22:28] on the surface it makes sense [17:37:53] brett: thanks! [17:39:45] Amir1: Happy to do the legwork if you want to open a ticket - again, not really a process to get something "approved" or "vetted" for a subdomain like that but I think if you find it important then I find it important :) [17:41:17] I personally think we should do a lot more typo domains that's my sneaky way of doing one :D [17:42:00] as a person who can't type one word without a mistake (I corrected several times in this same sentence) I'd be happy to have more [17:47:08] Amir1: AFAICT ncredir largely handles the natural consequences of needing to do *something* with domains that we own largely for security reasons - e.g. typosquatting rather than convenience redirects. One could argue that introducing a bunch of convenience redirects for subdomains might be a bit of a shift in scope. Not a bad shift, I'd just be curious to know what others here would [17:47:10] think [18:17:08] brett: ack, np! [19:57:14] 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11191439 (10Krinkle) [20:10:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [20:15:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [20:40:07] 06Traffic, 10DNS, 06SRE: Set mediawiki.gr, wikipedia.pt, and wiktionary.org.uk NS records to WMF - https://phabricator.wikimedia.org/T401438#11191594 (10BCornwall) 05Open→03Resolved @Alchimista Thanks for getting back to me! Sorry for the delay, I was enjoying the one-two punch of being out and then... [20:46:52] 10Domains: Transform wikipedia.pt into a portal - https://phabricator.wikimedia.org/T404913 (10BCornwall) 03NEW [20:51:02] 10Domains: Transfer wikipedia.pt domain to community - https://phabricator.wikimedia.org/T404913#11191640 (10BCornwall) [20:52:05] 10Domains: Transfer wikipedia.pt domain to community - https://phabricator.wikimedia.org/T404913#11191642 (10BCornwall) @CRoslof Is this your wheelhouse? [21:09:46] 06Traffic, 10DNS, 06SRE: Set mediawiki.gr, wikipedia.pt, and wiktionary.org.uk NS records to WMF - https://phabricator.wikimedia.org/T401438#11191677 (10BCornwall) [21:20:12] 06Traffic, 10Beta-Cluster-Infrastructure, 06Data-Persistence, 06SRE: ATS isn't caching documents in deployment-cache-upload07 - https://phabricator.wikimedia.org/T322575#11191738 (10bd808) 05Open→03Declined deployment-cache-text07 was replaced by deployment-cache-text08. `lang=shell-session,lines=1... [22:38:07] 06Traffic, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Rollout Phase 3] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11192102 (10Ladsgroup) If you feel like it, fawiki is an early adopter wiki in many areas. Feel free to add it. I c... [22:46:21] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Rename deployment-cache-(text|upload)0x to deployment-cp0x - https://phabricator.wikimedia.org/T280393#11192124 (10bd808) >>! In T280393#7163610, @taavi wrote: > One more issue: given cloud vps does not have per-role hiera keys, we need to rely on instance pre... [22:49:34] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE, 13Patch-For-Review: Incorrect X-Cache-Status reported by deployment-prep caches - https://phabricator.wikimedia.org/T269825#11192132 (10bd808)