[04:32:14] 06Traffic, 06SRE, 07affects-Kiwix-and-openZIM: Rate limiting/status code 429 for mwclient? - https://phabricator.wikimedia.org/T400018#11026576 (10Audiodude) Thanks again @Scott_French for the extremely helpful analysis! I plan to submit a PR to mwclient to update the docs for that method to indicate whi... [10:14:17] 06Traffic: New software: ProxyTester - https://phabricator.wikimedia.org/T400244#11027187 (10Fabfur) [10:24:04] 10netops, 06Infrastructure-Foundations, 06SRE: Inaccurate stats reported by cr2-codfw - https://phabricator.wikimedia.org/T400205#11027210 (10cmooney) Ok so I drained cr2-codfw of traffic and tried issuing the commands. Commands as supplied by Juniper aren't 100% correct either which is reassuring when medd... [10:28:55] FIRING: [4x] MaxConntrack: Max conntrack at 96.31% on ncredir3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [10:33:55] RESOLVED: [5x] MaxConntrack: Max conntrack at 94.27% on ncredir3003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [12:12:51] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Link errors: ssw1-d1-codfw <-> ssw1-f1-codfw - https://phabricator.wikimedia.org/T400253 (10cmooney) 03NEW p:05Triage→03Medium [13:18:41] cdanis: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1169149 is looking better now? [13:22:28] vgutierrez: +1! [13:23:18] cool... so what's the procedure to get this merged / deployed? I think it's my first MW CR.. after almost 8 years here [13:24:03] did you verify that eventgate is ready to process the new schema? [13:24:54] nope [13:26:08] it might need to be restarted :) iirc it fetches schemas on startup [13:26:49] 06Traffic, 10Liberica: Stop using lvs1013 as a liberica canary - https://phabricator.wikimedia.org/T400259 (10Vgutierrez) 03NEW [13:27:06] 06Traffic, 10Liberica: Stop using lvs1013 as a liberica canary - https://phabricator.wikimedia.org/T400259#11027777 (10Vgutierrez) 05Open→03In progress p:05Triage→03Medium [13:34:08] 06Traffic: Adapt varnish test script(s) to perform HAProxy configuration validation - https://phabricator.wikimedia.org/T399941#11027818 (10Fabfur) 05Open→03Resolved [13:35:50] 10netops, 06Infrastructure-Foundations, 06SRE: Homer: PyEz "ignore_warnings" does not work for port-block speed change warning - https://phabricator.wikimedia.org/T400261 (10cmooney) 03NEW p:05Triage→03Medium [14:26:34] 06Traffic, 10Hiddenparma, 06SRE: Browser behaviour detection at the edge - https://phabricator.wikimedia.org/T400270 (10Joe) 03NEW [15:02:17] 06Traffic, 10Hiddenparma, 06SRE: Browser behaviour detection at the edge - https://phabricator.wikimedia.org/T400270#11028212 (10Joe) [15:05:43] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp3071 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-instance=cp3071 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [15:10:44] 06Traffic, 03Wikimania-Hackathon-2025: Request for IP Whitelisting for Novotel and Trademark/Tribe Venues – Wikimania 2025 - https://phabricator.wikimedia.org/T400276 (10JorisDarlingtonQuarshie) 03NEW [15:15:43] RESOLVED: HaproxyKafkaExporterDown: HaproxyKafka on cp3071 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-instance=cp3071 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [15:24:37] 06Traffic, 10HaproxyKafka, 13Patch-For-Review: Haproxykafka silently stops sending request data to kafka - https://phabricator.wikimedia.org/T400039#11028316 (10Fabfur) [15:27:46] 06Traffic, 06Data-Engineering, 10HaproxyKafka, 13Patch-For-Review: Haproxykafka silently stops sending request data to kafka - https://phabricator.wikimedia.org/T400039#11028321 (10Fabfur) [15:28:23] 06Traffic, 06Data-Engineering, 10HaproxyKafka, 13Patch-For-Review: Haproxykafka silently stops sending request data to kafka - https://phabricator.wikimedia.org/T400039#11028327 (10Fabfur) Adding DE team too, considering that two hosts didn't sent messages for a long time and this could be impacting on the... [16:04:20] 06Traffic, 10HaproxyKafka, 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: Haproxykafka silently stops sending request data to kafka - https://phabricator.wikimedia.org/T400039#11028530 (10Ahoelzl) [19:16:51] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1074 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [19:18:22] ^^ we're restarting that host now, it should be downtimed [19:26:51] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1074 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [19:31:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1074 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [20:43:48] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Homer: PyEz "ignore_warnings" does not work for port-block speed change warning - https://phabricator.wikimedia.org/T400261#11029477 (10cmooney) [22:21:51] FIRING: FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1122 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [22:26:51] FIRING: [2x] FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1081 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [22:27:24] I acked the above alerts [22:28:13] Going fwd, I'm not sure if y'all want to be alerted every time a host is failing health checks. Maybe the alert means more than that? [22:31:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.2.30:9200 @ cirrussearch1122 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=eqiad&var-cluster=elasticsearch - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [23:48:26] may I suggest switching alerts to automatically created tickets? no noise on realtime chat channels but ALSO someone actually sees it and can react. we already do it for RAID checks for example.