[04:50:47] 10Traffic, 10MediaWiki-ResourceLoader, 06Operations, 06Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#2918703 (10Krinkle) In an attempt to verify whether or not we can observe there being more startup reques... [05:50:11] 10Traffic, 10MediaWiki-ResourceLoader, 06Operations, 06Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#2918747 (10Catrope) Keeping the max-age at 5 mins and forcing `Age: 0` sounds good to me. To respond to @... [15:23:36] there's a cp1008 WARNING about '/etc/varnish/directors.backend.vcl' for 23d now, shall I ack it? [15:25:39] paravoid: looking [15:26:27] paravoid: interesting, I can't see it in my usual icinga view. What am I missing? https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?nostatusheader&host=all&servicestatustypes=20&hoststatustypes=3&serviceprops=2097162&hostprops=2097162&sorttype=2&sortoption=3 [15:28:54] where did you click? [15:29:01] I click on the "All unhandled alerts" [15:29:15] so it shows it, albeit with an icon that says "under scheduled downtime" [15:30:52] oh I see, the URL I'm looking at probably excludes downtimed hosts/services [15:31:13] I don't remember exactly where I clicked to get it, I've managed to get that view months ago and bookmarked it :) [15:32:51] 10netops, 06Operations: cr2-esams<->cr2-eqiad link flaps - https://phabricator.wikimedia.org/T154577#2919570 (10faidon) p:05High>03Low Level3 responded yesterday that they performed a "warm reset" on the LTX card, which we indeed saw as a longer link down at Jan 4 18:37:32. The link seems stable since and... [15:52:12] bblack: shouldn't we avoid creating directors.{frontend,backend}.vcl if varnish::dynamic_directors is false? [16:00:12] it doesn't hurt [16:00:31] anyways, that whole dynamic_directors thing needs some rethink [16:01:45] I've ditched some parts of it in that patch series I'm hoping to merge up this week [16:12:09] oh ok, I'll just ack the warning for cp1008's template in the meanwhile then [16:15:16] what was wrong with it? [16:15:28] did I leave some horrible experiment in place? :) [16:15:44] /etc/varnish/directors.backend.vcl is empty and icinga was complaining about it being stale [16:15:59] hmmm ok [16:16:43] but yeah given that pu doesn't use dynamic directors there's no much need for checking whether those templates are fine, hence my question :) [16:17:44] yeah I'm not sure what the "correct" answer is there [16:17:56] normal production hosts it would never end up empty, but with the current code pu does [16:20:05] changing topic, after fixing varnishreqstats the purge rates really do look crazy now, we've got peaks of 400k/s [16:21:38] well, 400k/s across all the nodes [16:21:46] yep [16:22:01] it's not really a "fair" stat unless you divide it up per node [16:22:08] in terms of throwing it back at MW / services [16:23:36] so eqiad/text is avg 28K and peak ~100K [16:23:55] and 8 nodes, so avg 3.5K, peak ~12.5K [16:24:11] still crazy [16:24:54] most of them are probably misses, too [16:25:04] (purge misses, as in content wasn't in cache anyways) [16:25:15] and bunch of them are redundant [16:25:30] yeah, adding up purges doesn't make sense, it's all multicast [16:25:39] but 12.5 kreq/s is insane [16:26:10] we can also be more-fair and say that the variants aren't their fault either [16:26:26] for text it's like 6 variants each I think? [16:26:37] so that gets us down to 500/sec [16:26:42] more [16:26:42] well 600/sec [16:26:50] oh the new stuff for RB too, right [16:27:20] btw if the number of purges that are misses is exported by varnishstat -j then it is available in grafana for displaying [16:27:59] I think the normal is 4 [16:28:11] varnishstat doesn't break that down [16:28:25] yeah normal is still 4, at least it looks that way [16:28:28] e.g. [16:28:29] - ReqURL /wiki/Ottomanen [16:28:29] - ReqURL /w/index.php?title=Ottomanen&action=history [16:28:29] - ReqURL /wiki/Ottomanen [16:28:30] - ReqURL /w/index.php?title=Ottomanen&action=history [16:28:35] ^ which is text+mobile [16:29:00] so more like 875/sec actual article invalidations on avg [16:29:06] which is still crazy [16:29:25] I was now taking a look at purges on a single host like this: [16:29:27] timeout --foreground 1 varnishncsa -q 'ReqMethod eq "PURGE"' -F '%{Host}i-%U' | awk '{u[$1]++} END { for (el in u) { print u[el], el } }' | sort -rn | head [16:29:34] 300 de.wikivoyage.org-/w/index.php [16:29:34] 300 de.m.wikivoyage.org-/w/index.php [16:29:34] 101 pt.wikipedia.org-/w/index.php [16:29:34] 101 pt.m.wikipedia.org-/w/index.php [16:29:37] 100 ro.wikipedia.org-/w/index.php [16:29:39] 100 ro.m.wikipedia.org-/w/index.php [16:29:42] 100 en.wiktionary.org-/w/index.php [16:29:44] 100 en.m.wiktionary.org-/w/index.php [16:29:47] 14 no.wikipedia.org-/w/index.php [16:29:49] 14 no.m.wikipedia.org-/w/index.php [16:29:49] because of the above [16:29:52] that's 1 second [16:29:56] they're index.php?title=.... [16:30:01] for action=history [16:30:05] right [17:52:46] 10Traffic, 06Operations: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561#2920075 (10BBlack) 05Open>03Resolved a:03BBlack [17:53:25] 10Traffic, 06Operations, 07Wikimedia-Incident: Deploy redundant unified certs - https://phabricator.wikimedia.org/T148131#2920097 (10BBlack) These are now deployed (digicert in esams, globalsign elsewhere). Pending closing this until we document switching off either of the certs... [19:07:53] 10Traffic, 10Citoid, 06Operations, 10RESTBase, and 5 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#2920441 (10Jdforrester-WMF) [19:22:32] 10Traffic, 10Citoid, 06Operations, 10RESTBase, and 5 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#2920532 (10Mvolz) [19:22:44] 10Traffic, 10Citoid, 06Operations, 10RESTBase, and 5 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#1526094 (10Mvolz) [19:36:17] 10Traffic, 06Operations, 06Performance-Team, 06Reading-Web-Backlog, and 3 others: Performance review #2 of Hovercards (Popups extension) - https://phabricator.wikimedia.org/T70861#2920622 (10MBinder_WMF) [19:45:40] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920678 (10BBlack) It's not really my feature, I just happened to write the very short config patch to turn it on, because nobody else had at the... [20:19:45] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920833 (10Nuria) @JKatzWMF Looks like we already discussed on whether to support the missspelled version (ahem... one of them) and consensus wa... [20:20:07] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920834 (10Nuria) 05Open>03Resolved [20:25:40] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920852 (10Nuria) Documented issue in https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly#Changes_and_known_problems_since_2015-06-16 [21:04:50] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920920 (10JKatzWMF) >>! In T148780#2920833, @Nuria wrote: > > Either way there is no perfect solution but we rather not revisit a decision alre... [21:15:57] 10Traffic, 10Analytics, 06Operations, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2920974 (10Nuria) @JKatzWMF: sounds good, as I said on our end there are no changes needed to process the header either way. I just closed ticket... [21:17:38] 10Traffic, 06Operations: Fix broken referer categorization for visits from Safari browsers - https://phabricator.wikimedia.org/T154702#2920977 (10JKatzWMF) [22:35:44] 10Traffic, 10Citoid, 06Operations, 10RESTBase, and 4 others: Set-up Citoid behind RESTBase - https://phabricator.wikimedia.org/T108646#2921419 (10Jdforrester-WMF)