[06:27:42] by the way I forgot to mention it yesterday, but the per-host frontend performance data is now available: https://grafana.wikimedia.org/d/M7xQ_BeWk/response-time-by-host [06:30:04] that's NavigationTiming responseStart, or TTFB from the perspective of the client for the page's request [07:16:11] 10Traffic, 10Continuous-Integration-Infrastructure, 10Operations: Caching of https://doc.wikimedia.org/cover/mediawiki-libs-IPUtils/IPUtils.php.html is inconsistent - https://phabricator.wikimedia.org/T252131 (10ema) >>! In T252131#6116691, @Reedy wrote: > It was definitely updated at least twice today In o... [07:30:30] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, 10Thumbor: 500, Internal Server Error on Commons for images at specified size - https://phabricator.wikimedia.org/T250211 (10ema) Request/response details, might be useful to help diagnosing the issue: ` ** << BeReq >> 193204670 -- B... [07:31:02] 10Traffic, 10MediaWiki-Cache, 10Operations, 10serviceops, and 4 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Joe) This change was released to production to all wikis yesterday. The effect can be seen in this 12h moving average of purge r... [07:32:28] 10Traffic, 10Wikimedia-Apache-configuration, 10Operations, 10Wikimedia-Site-requests: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648 (10ema) [08:25:37] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor: thumbor: set Cache-Control on 404 responses that ensures cacheability - https://phabricator.wikimedia.org/T252509 (10ema) [08:26:02] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor: thumbor: set Cache-Control ensuring cacheability on 404 responses - https://phabricator.wikimedia.org/T252509 (10ema) [08:26:10] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor: thumbor: set Cache-Control ensuring cacheability on 404 responses - https://phabricator.wikimedia.org/T252509 (10ema) p:05Triage→03High [09:52:13] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Joe) >>! In T133821#6118058, @aaron wrote: >>>! In T133821#6092867, @Joe wrote: >> At a later time, we could think of changing the logic, and make purges avoid ra... [10:22:54] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: thumbor: set Cache-Control ensuring cacheability on 404 responses - https://phabricator.wikimedia.org/T252509 (10ema) [12:07:26] 10Traffic, 10Operations: Investigate trafficserver-tls crash on cp3064 - https://phabricator.wikimedia.org/T240183 (10Marostegui) @ema @Vgutierrez any outcome here? Any point on keeping this track opened? [12:31:53] 10Traffic, 10MediaWiki-Cache, 10Operations, 10serviceops, and 4 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Ladsgroup) Amazing work. Thank you! [12:53:35] 10Traffic, 10Operations, 10Privacy Engineering, 10Research, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) Thanks @leila ! I would be happy to merge my patches but i don't have +2 on that repo. There is... [13:41:39] hi all im about to disable puppet on all cp hosts to deploy the following change https://gerrit.wikimedia.org/r/c/operations/puppet/+/583342 [13:43:52] vgutierrez: FYI ^ [13:44:00] errr [13:44:12] jbond42: an ATS upgrade is on progress with puppet disabled, can we wait? :) [13:45:26] vgutierrez: yes i can send a revert quickly [13:46:31] yeah.. that's currently being applied :/ [13:46:42] vgutierrez: what does the puppet diff say? [13:47:29] let's check puppetboard [13:47:44] thx :) [13:47:51] right. If the diff is noop-like looking there's no need to revert [13:48:05] ema: i hadn't actully merged my change [13:48:18] i just sent a revert and merged both so it was a noop [13:48:24] ah wonderful [13:48:30] oh cool :) [13:48:31] i can hold of untill this is finished [13:48:39] jbond42: I'll ping you in a few minutes, sorry about that :) [13:48:46] no probs :) [14:08:47] <_joe_> https://w.wiki/Qc$ the purges reduction seems very clear to me now [14:09:23] jbond42: all yours.. be sure to disable puppet again across the cp cluster, I'm seeing your puppet disable message only on 26 nodes right now [14:09:33] _joe_: nice [14:10:01] vgutierrez: thanks and will do [14:15:10] I'm going ahead with https://gerrit.wikimedia.org/r/c/operations/puppet/+/595493 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/595494 which should be no impact afaik, let me know if you think otherwise! [14:18:41] 10Traffic, 10Operations: Investigate trafficserver-tls crash on cp3064 - https://phabricator.wikimedia.org/T240183 (10ema) 05Open→03Resolved a:03ema >>! In T240183#6128998, @Marostegui wrote: > @ema @Vgutierrez any outcome here? Any point on keeping this track opened? Nope, thanks @Marostegui. We can r... [14:38:34] and now it is lvs_setup's turn, https://gerrit.wikimedia.org/r/c/operations/puppet/+/595946 [14:38:59] I take it I'm ok to go ahead as described in https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers ? [14:42:53] 10netops, 10Operations: Routinator RSYNC errors - https://phabricator.wikimedia.org/T240817 (10ayounsi) 05Stalled→03Resolved Fix is now running in prod. Grafana alerts have been updated accordingly. [14:46:42] <_joe_> godog: if puppet has run on the backend and it responds to requests like the ones pybal makes, you're GTG [14:48:59] afaict yeah it is all up on the backend [14:51:01] ok I'll go ahead [14:53:08] 10Traffic, 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: thumbor: set Cache-Control ensuring cacheability on 404 responses - https://phabricator.wikimedia.org/T252509 (10Gilles) 05Open→03Resolved a:03Gilles Ratio of 404s to 200s on Thumbor are now back to their pre-2020-04-30 levels [14:56:54] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, and 2 others: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly - https://phabricator.wikimedia.org/T236240 (10Gilles) 05Open→03Resolved Fix confirmed on https://commons.wikimedia... [14:58:08] ok I gather lvs2010 is low-traffic standby in codfw, going to restart pybal there shortly [15:02:30] godog: ack [15:08:32] thanks, all good [15:18:33] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10Jclark-ctr) @ayounsi did we have host names yet? [15:23:32] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10ayounsi) Yep, see diagram (minus the typo). `cloudsw-c8-eqiad` `cloudsw-d5-eqiad` [15:45:07] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, and 2 others: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly - https://phabricator.wikimedia.org/T236240 (10Elitre) Thanks all. [16:20:12] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, and 2 others: Ghostscript outputs errors to stdout despite -q, preventing Thumbor from generating some thumbnails properly - https://phabricator.wikimedia.org/T236240 (10AntiCompositeNumber) [16:35:51] 10Traffic, 10MediaWiki-Cache, 10Operations, 10serviceops, and 4 others: Stop sending purges for `action=history` for linked pages. - https://phabricator.wikimedia.org/T250261 (10Krinkle) 05Open→03Resolved a:05daniel→03Krinkle [16:35:57] 10Traffic, 10Core Platform Team, 10Operations, 10serviceops, and 2 others: Reduce rate of purges emitted by MediaWiki - https://phabricator.wikimedia.org/T250205 (10Krinkle) [16:42:42] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review, and 2 others: Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Pchelolo) [18:16:10] 10Traffic, 10Operations: Maxmind data update issues for DNS (and others?) - https://phabricator.wikimedia.org/T252577 (10BBlack) p:05Triage→03Medium [18:33:34] 10Traffic, 10Operations: Maxmind data update issues for DNS (and others?) - https://phabricator.wikimedia.org/T252577 (10BBlack) Diving a little deeper on the symlink issue: 1. gdnsd uses libev's `ev_stat` watcher for this and other similar cases, as documented here: http://pod.tst.eu/http://cvs.schmorp.de/li... [18:45:02] 10Traffic, 10Operations: Maxmind data update issues for DNS (and others?) - https://phabricator.wikimedia.org/T252577 (10faidon) > I know that historically MaxMind has claimed they update the data roughly on a weekly basis, and maybe in this case it was a normal weekly update and we're just misaligned with the... [20:24:47] 10Traffic, 10Operations, 10Performance-Team (Radar): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10dpifke) This is deployed, and I updated the Grafana dashboard. To nuke the data, we would need to restart Prometheus with `--web.enable-admin-api` flag... [20:51:52] 10Traffic, 10Anti-Harassment, 10Operations, 10serviceops: Add IP Info (ASN & Geolocation) to requests to MediaWiki - https://phabricator.wikimedia.org/T251933 (10aezell) I wanted to clarify that this is just in the experiment and investigation stage. We want to start a discussion about using MaxMind to g...