[00:05:40] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:05:40] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:12:36] 06Traffic, 10Incident Tooling, 06SRE: ncredir redirects for status.wiki* --> status.wikimedia.org - https://phabricator.wikimedia.org/T318804#11334175 (10Pppery) [04:14:43] FIRING: [4x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:19:43] FIRING: [17x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:24:43] FIRING: [17x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:29:43] FIRING: [17x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:34:43] RESOLVED: [17x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [04:59:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334176 (10Papaul) [05:04:17] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334177 (10Papaul) @cmooney i update all the IP's to match the other POP sites. I will be re-running the configuration and validation sometimes this week in m... [06:15:06] 06Traffic, 10Hiddenparma, 06SRE: Collect known client fingerprints for common libraries - https://phabricator.wikimedia.org/T409024 (10Joe) 03NEW [06:15:21] 06Traffic, 10Hiddenparma, 06SRE: Collect known client fingerprints for common libraries - https://phabricator.wikimedia.org/T409024#11334201 (10Joe) p:05Triage→03Medium [06:41:51] 06Traffic, 10Hiddenparma, 06SRE: Collect known client fingerprints for common libraries and browsers - https://phabricator.wikimedia.org/T409024#11334225 (10Joe) [08:05:40] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:23:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2014 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [08:28:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2014 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [11:54:11] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11335237 (10cmooney) Thanks @papaul. One to discuss with @ayounsi when he is back are the IPv6 gateway addresses on the vlans. ` on asw1-22 irb.411 public1-ul... [12:05:40] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:18:50] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Distinguish request classes based on user-agent declaration - https://phabricator.wikimedia.org/T408060#11335284 (10hnowlan) [12:21:00] FIRING: [6x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:26:00] FIRING: [17x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:31:00] FIRING: [17x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:36:00] RESOLVED: [26x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:28:12] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067 (10cmooney) 03NEW p:05Triage→03Medium [13:52:08] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335585 (10cmooney) [13:53:58] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335590 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a04c020e-81be-4ee8-bf2f-5bcc8830a8da) set by cmooney@cumin1003 for 2:00:00... [16:05:40] FIRING: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:53:45] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11336622 (10cmooney) 05Open→03Resolved Uplinks moved, the actual gateway move from CR to switches we will wait until Nokia... [18:45:25] RESOLVED: SystemdUnitFailed: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:29:47] 06Traffic, 10Hiddenparma, 06SRE, 13Patch-For-Review: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826#11337351 (10Milimetric) > I don't think it should be discussed eslewhere because it's not really a valid concern here: > > * We only call the browser... [19:33:26] 06Traffic, 10DNS: Request to create the donate.wikipedia25.org domain + 301 redirect to a donate.wiki page - https://phabricator.wikimedia.org/T408168#11337391 (10BCornwall) This is pushed. ` [~]$ curl -I https://donate.wikipedia25.org [...] location: https://donate.wikimedia.org/w/index.php?appeal=WP25&... [21:06:12] 10Acme-chief: acme-chief doesn't automatically re-create certificates on SAN change - https://phabricator.wikimedia.org/T409114 (10BCornwall) 03NEW [21:11:32] 10Acme-chief: acme-chief doesn't automatically re-create certificates on SAN change - https://phabricator.wikimedia.org/T409114#11337805 (10Vgutierrez) It does but acme-chief also respects the staging time so it's probably under /new instead of /live or still blocked cause it's waiting the staging time till a pr... [21:17:04] 10Acme-chief: acme-chief doesn't automatically re-create certificates on SAN change - https://phabricator.wikimedia.org/T409114#11337814 (10BCornwall) 05Open→03Invalid Ah, I see that it's stuck in the staging time indeed. Thanks for the clarification. [23:19:15] 06Traffic: ncmonitor: Migrate from deprecated API to new API - https://phabricator.wikimedia.org/T408857#11338135 (10BCornwall) [23:28:34] 06Traffic, 13Patch-For-Review: ncmonitor: Migrate from deprecated API to new API - https://phabricator.wikimedia.org/T408857#11338146 (10BCornwall) 05Open→03In progress