[07:13:25] 10netops, 10Operations, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10ayounsi) Thanks. Manual action is better here to prevent flapping. > If all good, change the alert target so it notifies the whole of SRE This is done too. And I added the alert to... [07:48:13] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) >>! In T133821#6094125, @BBlack wrote: >>>! In T133821#6092865, @Joe wrote: >> - Define a schema for a "url purge mes... [07:48:33] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) [08:21:05] 10Traffic, 10Operations, 10Parsoid, 10RESTBase, and 2 others: HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10ema) >>! In T250815#6094356, @Pchelolo wrote: > #traffic This seems like a borderline UB... [09:48:31] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) Looking at our existing event schemas, [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/event-schemas/+/master/jsonschema/resource_change/1.0.0.ya... [13:44:52] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10BBlack) I like the root event timestamp info. We could potentially put in future rules to help by ignoring ancient purges, in some cases (e.g. if we can guarante... [14:07:32] 10Traffic, 10Operations, 10Parsoid, 10RESTBase, and 2 others: HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10Pchelolo) 05Open→03Resolved a:03Pchelolo >>! In T250815#6096169, @ema wrote: >>>!... [14:08:47] 10Traffic, 10Operations, 10Patch-For-Review: ats-tls ran out of FDs on cp1089 - https://phabricator.wikimedia.org/T248736 (10ema) >>! In T248736#6094878, @Ottomata wrote: > Is there a more permanent fix? Any idea why ATS was leaking the socket FDs? Nope, we'll try to reproduce next week in isolation with h... [14:09:48] 10Traffic, 10DNS, 10Operations: Reverse DNS missing for some hosts - https://phabricator.wikimedia.org/T251522 (10Reedy) [14:13:26] Reedy: traffic? ^^:) [14:24:24] well, you do some DNS [14:24:28] Just not internal :P [14:24:31] consistency is hard :P [15:04:40] 10netops, 10Operations, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10Papaul) I think it is better to do it when the new msw1 is in place. No need to do it now on the old msw1-eqiad [15:26:41] 10netops, 10Operations, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10ayounsi) 05Open→03Stalled Stalling the task until we either: * can start doing more intrusive testing to see if it works as expected * msw1-eqiad is replaced with T225121 [15:33:07] 10Traffic, 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Core Platform Team Workboards (Clinic Duty Team): Inconsistent caching/staleness of mobile-html responses for certain articles - https://phabricator.wikimedia.org/T249770 (10Pchelolo) [15:42:51] 10netops, 10Operations: Peer with SFMIX at ulsfo - https://phabricator.wikimedia.org/T251536 (10faidon) p:05Triage→03Medium [15:50:11] 10Traffic, 10Operations: ATS: Add the ability to check if origin server responses can be cached and their lifetime to the Lua plugin - https://phabricator.wikimedia.org/T251537 (10ema) [15:50:25] 10Traffic, 10Operations: ATS: Add the ability to check if origin server responses can be cached and their lifetime to the Lua plugin - https://phabricator.wikimedia.org/T251537 (10ema) [16:00:27] 10netops, 10Operations: Peer with SFMIX at ulsfo - https://phabricator.wikimedia.org/T251536 (10faidon) I just submitted their form. [16:01:13] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) After a discussion on the patch, it was clearer to me that some information can't be removed from the message, and that makes `resource_change` the perfect f... [16:09:43] 10Traffic, 10Core Platform Team, 10Operations: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10Krinkle) [16:12:25] 10Traffic, 10Core Platform Team, 10Operations: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10Krinkle) [16:12:56] 10Traffic, 10Core Platform Team, 10Operations: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10Krinkle) > The kafka topic mediawiki.job.cdnPurge is currently receiving many (most?) purge messages. Maybe most by volume, but it's semantically very diferrent and a rather internal... [16:15:50] 10Traffic, 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Core Platform Team Workboards (Clinic Duty Team): Inconsistent caching/staleness of mobile-html responses for certain articles - https://phabricator.wikimedia.org/T249770 (10ema) The specific issues described in this ticket sho... [17:41:44] 10Traffic, 10Operations, 10Parsoid, 10RESTBase, and 2 others: HTTP 400 Error when trying to save an edit on English Wikipedia: Error contacting the Parsoid/RESTBase server - https://phabricator.wikimedia.org/T250815 (10matmarex) [18:29:07] 10Traffic, 10Analytics, 10Operations: Remove North Korea from data quality traffic entropy reports - https://phabricator.wikimedia.org/T251546 (10Nuria) [18:30:01] 10Traffic, 10Analytics, 10Operations: Remove North Korea from data quality traffic entropy reports - https://phabricator.wikimedia.org/T251546 (10Nuria) I would remove it from daily/hourly jobs both: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/data_quality_stats/hourly/queries/traffic_e... [18:39:21] 10Traffic, 10Operations, 10Product-Infrastructure-Team-Backlog, 10RESTBase, 10Core Platform Team Workboards (Clinic Duty Team): Inconsistent caching/staleness of mobile-html responses for certain articles - https://phabricator.wikimedia.org/T249770 (10Pchelolo) 05Open→03Resolved Seems like all the my... [18:43:26] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations: Remove North Korea from data quality traffic entropy reports - https://phabricator.wikimedia.org/T251546 (10mforns) a:03mforns [21:59:25] 10Traffic, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team (Kanban), 10Upstream: Jenkins job builder ignores BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar)